URLs

URLs are obviously an important part of web sites and apps. Unfortunately, many developers don't give them enough attention.
Basic ideas:
- URLs are technically opaque (不透明).
  - That is, are just a unique string to pass server↔client, with no semantic meaning.
- But they are often used by users to determine the structure of the site.
  - Displayed in search engine results, copied into emails, etc.
- So computers don't care what your URLs look like, but your users probably do. So your URLs should be designed with your users in mind.
- URLs identify (找出？鉴定？) a resource (网络资源, piece of content); file formats (HTML, PDF, CSV, etc.) represent (表示) a resource.
URLs should correspond one-to-one to a resource.
Multiple URLs per resource: one piece of content with different URLs
- Could be as simple as /dir/ and /dir/index.html.
- Might be session or referrer info in URLs, URL shorteners.
- Can confuse users, splits PageRank on that content, separately cached/crawled.
- Solutions: be consistent in your use of URLs, redirect to a canonical URL for the content.
Multiple resources per URL: different pieces of content available without changing URL.
- Often caused by JS: tabs, dynamic loading of content, etc.
- Also by Flash blobs. Please stop using Flash.
- Can't be linked: can't bookmark or email URL, Google can't link users in to content,…
- How many visitors do you lose when every link must also include “click the third tab, and then scroll down 2/3 of the way”?
Examples:
- Amazon URLs contain lots of crap [same content at different URL]
- Google maps doesn't update URL as you browse, making it hard to link to what you find.
- Google Maps mashup with good URLs.
Particular problem: #! (hashbangs) in URLs.
- The idea: JavaScript tricks mean many pieces of content at a URL, but the #! can be used as a hack to indicate which.
- But the fragment (after the #) is used to indicate position on a page, not which page.
- The fragment is not part of the HTTP request (so can't be fetched by many tools), relies on JavaScript, can't be cached, etc.
- Using the #! is better than not changing your URL at all. At least then the content can be bookmarked, etc.
URLs should be persistent.
- Redirect if moving.
- Session info in URL: what happens after expiry?
- Persistent URLs mean links to your site. Links are good!

URL Usability

Theorem: URLs are an important part of the user interface of your site/app. People actually look at URLs and use them to make decisions about your site: whether to visit or not, how it is structured.
- Proof: Google, Baidu, Bing, DuckDuckGo all display URLs in search results. Those result screens are very carefully designed: they wouldn't show it if people didn't look at it.
URLs should be helpful to the user.
- Since they are technically opaque, can be designed as you like for the user.
- Should be human-readable, relatively short.
Readable URLs:
- Makes search results and links more useful: users can look at the URL (possibly by mousing-over) and determine something about where they're going or the structure of your site.
  - avoid URL shorteners
- The / should be used to build a hierarchy (阶层).
- Avoid escaped characters (space being escaped as %20, etc.), MiXed cAse. These are hard to read out-loud and remember.
- Avoid query strings.
  - Suggestion: for algorithm input only; if possible, redirect after processing.
- Avoid artificial IDs (e.g. /people/237/phone)
  - Suggestion: slugs (e.g. /people/greg-baker/phone)
  - More later.
Non-ASCII characters in URLs…
- Any special characters in URLs, including any non-ASCII text, must be UTF-8 encoded and then escaped (%8e). That's particularly hard to read.
- Browsers will sometimes display the unescaped version.
- You might think the URL is http://example.com/欢迎 (and it might look like that sometimes).
- But it's actually http://example.com/%E6%AC%A2%E8%BF%8E (and will often look like that to the user).
- Solution? Wish I knew. Maybe escaped Unicode like that? Maybe Pinyin (or other phonetic transcription)? Maybe use English in URLs?
Design path hierarchy for users, not the implementation.
- Bad but common: URLs designed as a result of code layout, like /pim-module/show_person.php?id=237
- I don't care about: module layout, PHP, who #237 is, that “id” is a DB column or key in your code.
- … and I already know I'm trying to “show” info (because it was a GET request).
- Users (of a PIM) care about people and the properties they have.
- e.g.
  - /people/ (list of people)
  - /people/greg-baker (my info)
  - /people/greg-baker/phones (my phone numbers)
URLs should be “hackable”.
- For example, if looking at a course web page /cc/470/ggbaker/, you should be able to find other offerings at /cc/470/ and another instructor's offering at /cc/470/apike/.
- URLs should be guessable, editable, type-able.
- … should provide context for the user.
- But hierarchies don't have to be deep.
  - e.g. Wikipedia has basically two levels: language and article.

Implementing Good URLs

Fundamentally, all technologies have some equivalent of the CGI PATH_INFO variable.
- e.g. /app/people/you could be handled by /app/index.cgi with PATH_INFO == '/people/you'.
- After that, it's just a simple programming problem to get the right content out.
mod_rewrite allows URL to URL mapping.
- … but it's ugly.
- It solves a problem that shouldn't exist in the first place.
Modern frameworks allow arbitrary URL construction with the dispatcher/controller.
- e.g. incoming URLs are matched against a series of regular expressions; each rule corresponds to a controller function.
- Give some thought to URLs as you're constructing them (in urls.py, routes.rb, etc.).
- Maybe the default routes aren't what you want. Give it some thought as you're starting.
Leave your URLs flexible.
- Don't hard-code URLs in templates: makes it very hard to change during development.
- Use the functions your framework provides to automatically determine the URL for a controller.
- Makes it much easier to “design” URLs during development.
- e.g. in a Rails template:
```
<%= link_to 'Edit', {:controller=>'people', :action=>'edit'} %>
```
- e.g. in Django view code:
```
return HttpResponseRedirect(reverse('people.views.show', kwargs={'person': p.slug}))
```
Flexible URLs mean you can work on features for your site with temporary URLs.
- …but redesign the URLs once you can see how things work and how the features fit togethers.
- Do the redesign before you put code in production and the URLs become public: after that you're stuck with them.
Even if you have to change URL after release, it's usually easy to leave a redirect in their place.
- e.g. Django generic view redirect_to used in urls.py.
Use slugs instead of artificial keys in URLs where possible…

Slugs

[“Slug” is apparently an old newspaper term for the short names that would be used to identify a story as it was being written.]
For us, a slug is a short string that (1) identifies a particular piece of content so we can find it, (2) is safe to use in a URL, and (3) is at least a little meaningful.
For example, a blog post with the title “Man Bites Dog!” might get the slug “man-bites-dog”. Another story with the same title might get “man-bites-dog-2”.
- If we have these slugs for each story (stored in an indexed field in the database), then we can use URLs like /posts/man-bites-dog, and display the right story.
- … instead of using the primary key and getting a URL like /posts/621 or even worse, /display_post.asp?id=621.
- All we have to do in code is look up by the slug instead of primary key.
- Maybe /posts/2013/04/man-bites-dog would be better: less chance of collision; even more useful info in URL. Just look up by slug and year and month.
Building a slug is something that can be done by code, not the author.
- You'll find a slug function in any framework, or a plugin module.
- If you're lucky, you just have to express “use the title field and build a slug unique for that year+month.”
- But be careful: does the framework code handle collisions? Check that.
- Try to find a field/fields that are usually unique: first+last name,
- Or maybe you already have something like a userid that is definitely unique.
One of the things functions that build slugs usually do is strip non-letter/number characters.
- Would turn “Hello!!!☺” into “hello” (good), but “欢迎到我的博客” into “” (bad).
- There is a project called Unidecode that is basically a huge dictionary of Unicode characters to phonetic transcriptions.
- e.g. slugify(unidecode(u"欢迎到我的博客")) == "huan-ying-dao-wo-de-bo-ke"
- Maybe not perfect, but better than “” or a database key (or making the user do it).
- Creates a not-bad URL: /posts/huan-ying-dao-wo-de-bo-ke.

REST

REST, short for REpresentational State Transfer
A style/schema/convention for interaction with a web system.
- Not a specific technology or standard.
- Allows another system to interact with your app's data: it's an API.
- … but does so over HTTP and alongside your regular web frontend.
Basic ideas:
- URLs identify resources. (≈ nouns, e.g. Greg's schedule.)
- content types (formats + media types) represent resources. (e.g. an iCalendar-format calendar with media type text/calendar.)
- Representations are exchanged (transferred) by HTTP methods (≈ verbs, GET, POST, PUT, DELETE)
- The server keeps track of the state of the data (probably with a database).
URLs can represent both single items and collections.
- /people/: collection of all people.
- /people/greg-baker: one person's info.

The HTTP methods map onto the CRUD (Create, Retrieve, Update, Delete) operations:

Op	Method	Single item	Collection
R	`GET`	retrieve representation	retrieve list of members
U	`PUT`	update item with info from provided representation	replace collection with new members
C	`POST`	create new sub-info	new entry in collection
D	`DELETE`	delete item	delete collection

The web app should respond to all four methods on a URL and take appropriate actions.
- e.g. PUT request with content-type text/calendar: replace old calendar data with this calendar.
- e.g. POST request with content-type text/calendar: add a meeting to calendar.
- e.g. GET request with Accept: text/calendar: return calendar as iCalendar.
- e.g. GET request with Accept: text/html: return calendar as HTML page.
Choice of representation:
- HTML for the browser GET (which sends Accept: text/html anyway).
- A standard type if one exists (e.g. vCard for contact info, GPX for GPS data)
- JSON in most other cases.
- HTML is probably read-only, but the others could be taken for update/create operations as well.
Another example: a blog with REST interface.
- Each blog post will have its own URL like /posts/2013/about-rest.
- Collections of posts will have URLs: /posts/ for all posts, and maybe /posts/2013/ or /posts/2013/05/ for some date-based subsets.
- Comments might also have their own URLs: /posts/2013/about-rest/comments/, /posts/2013/about-rest/comments/47
- When a browser requests a post (GET /posts/2013/about-rest with HTTP header Accept: text/html,…) it will get the HTML representation it wants.
- If another tool requests with different headers (like Accept: application/json), it can get a machine-readable version for processing.
- An author can create a new blog entry by POSTing to /posts/ with a JSON representation of the content (and HTTP request header Content-Type: application/json).
- An author can edit a post by PUTting a JSON representation to /posts/2013/about-rest.
- The site's mobile app could post a user's comment by POSTing its JSON content to /posts/2013/about-rest/comments/ (with Content-Type: application/json).
- The web front end could let users comment with a form <form action="/posts/2013/about-rest/comments/" method="post">. It will be sent with Content-Type: application/x-www-form-urlencoded or multipart/form-data.
- ⋮ and so on.
- [Of course, those should have authentication/authorization checks done as appropriate.]

REST APIs usually make richer use of the HTTP status codes than a simple web site. They can be used to give very useful information about the results. For example:

Action	HTTP Response
POST to create new item	`HTTP/1.1 201 Created Location: http://example.com/items/the-new-item`
PUT to update item	`HTTP/1.1 204 No Content`
DELETE to remove something undeleteable	`HTTP/1.1 405 Method Not Allowed Allow: OPTIONS, GET, HEAD, POST`
PUT with XML data where JSON was expected	`HTTP/1.1 415 Unsupported Media Type`
PUT with malformed JSON	`HTTP/1.1 400 Bad Request`
POST to create new item with last-modified date in the future	`HTTP/1.1 422 Unprocessable Entity`
POST to do some big calculation in the background	`HTTP/1.1 202 Accepted Location: http://example.com/jobs/12345 Content-type: application/json {"job-id": 12345, "time-estimate": 100}`

Authenticating REST: an API can't have a login procedure like a user-facing website. There are various options to authenticate users.
- A fixed token in the request:
```
PUT /items/1234?token=62b3e2bb920dd8 HTTP/1.1
Content-type: application/json

{ … }
```
- Something in the message body:
```
PUT /items/1234 HTTP/1.1
Content-type: application/json

{ "API-token": "62b3e2bb920dd8", … }
```
- Best option: A key that must be calculated with each request: (like Amazon S3)
```
PUT /items/1234 HTTP/1.1
Authorization: 62b3e2bb920dd8
Content-type: application/json

{ … }
```
- Of course, with all of these you must also check authorization: that the authenticated user is actually allowed to take this action.
Put this together with good URL design:
- All of these:
  - /pim-module/show_person.php?id=123
  - /pim-module/get_vcard.php?id=123
  - /pim-module/update.php?id=123
  - /pim-module/update_json.php?id=123
- become /people/greg-baker with different methods/parameters.

It's less clear how to do REST for non-CRUD operations.

For example, transferring money between bank accounts. Some ways you might represent a $100 transfer from account #1234 to #5678:

PUT /accounts/1234/transfer-to/5678/amount/100 HTTP/1.1

PUT /accounts/1234 HTTP/1.1
Content-type: application/json; charset=utf-8

{"action": "transfer", "destination-account": 5678, "amount": 100}

PUT /accounts/5678 HTTP/1.1
Content-type: application/json; charset=utf-8

{"action": "transfer", "source-account": 1234, "amount": 100}

POST /accounts/1234/transfers/ HTTP/1.1
Content-type: application/json; charset=utf-8

{"destination-account": 5678, "amount": 100}

The Masse book suggests a verb as the last component of the URL (if it's not one of the CRUD operations):
```
/accounts/1234/transfer
/alerts/1234/resend
```

REST has replaced other HTTP API technologies (like SOAP) because it's simpler.
- Uses technologies we already understand.
- Simple to work with (since it uses HTTP, many existing tools work).
- Can sit alongside your regular frontend.
There are REST helper libraries for most frameworks.
- Understanding REST in Rails 3
- Django REST framework
- These take care of some of the common tasks, so you don't have to.
- Be careful that the “helper” doesn't cause more work than it saves. Doing basic REST stuff might be as simple as:
```
def item(request, slug):
    item = get_object_or_404(Item, slug=slug)
    if request.method == 'GET':
        …
    elif request.method == 'POST':
        …
    else:
        return MethodNotAllowedResponse()
```
- But there are things like serializing objects, and parsing the Accept header that a library can help with.
Should you really create an API for your web app?
- Why API as a Strategy
- Why you absolutely MUST write an API when you write your next app
- Yes, I really think you should.
- Lets others interact with your data (in ways you can safely allow).
- e.g. you don't have a mobile app, but somebody really wants one. They can create it.
- e.g. Google+ doesn't have a full API, so can't be updated from HootSuite, etc. (It's read-only.) Limited ways it could be used.
- Allows users to form a community and interact with your site in ways you didn't expect, but they need.
- As long as your API is secure, that's a good thing.