URLs
- URLs are obviously an important part of web sites and apps. Unfortunately, many developers don't give them enough attention.
- Basic ideas:
- URLs are technically opaque (不透明).
- That is, are just a unique string to pass server↔client, with no semantic meaning.
- But they are often used by users to determine the structure of the site.
- Displayed in search engine results, copied into emails, etc.
- So computers don't care what your URLs look like, but your users probably do. So your URLs should be designed with your users in mind.
- URLs identify (找出?鉴定?) a resource (网络资源, piece of content); file formats (HTML, PDF, CSV, etc.) represent (表示) a resource.
- URLs are technically opaque (不透明).
- URLs should correspond one-to-one to a resource.
- Multiple URLs per resource: one piece of content with different URLs
- Could be as simple as
/dir/
and/dir/index.html
. - Might be session or referrer info in URLs, URL shorteners.
- Can confuse users, splits PageRank on that content, separately cached/crawled.
- Solutions: be consistent in your use of URLs, redirect to a canonical URL for the content.
- Could be as simple as
- Multiple resources per URL: different pieces of content available without changing URL.
- Often caused by JS: tabs, dynamic loading of content, etc.
- Also by Flash blobs. Please stop using Flash.
- Can't be linked: can't bookmark or email URL, Google can't link users in to content,…
- How many visitors do you lose when every link must also include “click the third tab, and then scroll down 2/3 of the way”?
- Examples:
- Amazon URLs contain lots of crap [same content at different URL]
- Google maps doesn't update URL as you browse, making it hard to link to what you find.
- Google Maps mashup with good URLs.
- Particular problem:
#!
(hashbangs) in URLs.- The idea: JavaScript tricks mean many pieces of content at a URL, but the
#!
can be used as a hack to indicate which. - But the fragment (after the
#
) is used to indicate position on a page, not which page. - The fragment is not part of the HTTP request (so can't be fetched by many tools), relies on JavaScript, can't be cached, etc.
- Using the
#!
is better than not changing your URL at all. At least then the content can be bookmarked, etc.
- The idea: JavaScript tricks mean many pieces of content at a URL, but the
- URLs should be persistent.
- Redirect if moving.
- Session info in URL: what happens after expiry?
- Persistent URLs mean links to your site. Links are good!
URL Usability
- Theorem: URLs are an important part of the user interface of your site/app. People actually look at URLs and use them to make decisions about your site: whether to visit or not, how it is structured.
- Proof: Google, Baidu, Bing, DuckDuckGo all display URLs in search results. Those result screens are very carefully designed: they wouldn't show it if people didn't look at it.
- URLs should be helpful to the user.
- Since they are technically opaque, can be designed as you like for the user.
- Should be human-readable, relatively short.
- Readable URLs:
- Makes search results and links more useful: users can look at the URL (possibly by mousing-over) and determine something about where they're going or the structure of your site.
- avoid URL shorteners
- The
/
should be used to build a hierarchy (阶层). - Avoid escaped characters (space being escaped as
%20
, etc.), MiXed cAse. These are hard to read out-loud and remember. - Avoid query strings.
- Suggestion: for algorithm input only; if possible, redirect after processing.
- Avoid artificial IDs (e.g.
/people/237/phone
)- Suggestion: slugs (e.g.
/people/greg-baker/phone
) - More later.
- Suggestion: slugs (e.g.
- Makes search results and links more useful: users can look at the URL (possibly by mousing-over) and determine something about where they're going or the structure of your site.
- Non-ASCII characters in URLs…
- Any special characters in URLs, including any non-ASCII text, must be UTF-8 encoded and then escaped (
%8e
). That's particularly hard to read. - Browsers will sometimes display the unescaped version.
- You might think the URL is
http://example.com/欢迎
(and it might look like that sometimes). - But it's actually
http://example.com/%E6%AC%A2%E8%BF%8E
(and will often look like that to the user). - Solution? Wish I knew. Maybe escaped Unicode like that? Maybe Pinyin (or other phonetic transcription)? Maybe use English in URLs?
- Any special characters in URLs, including any non-ASCII text, must be UTF-8 encoded and then escaped (
- Design path hierarchy for users, not the implementation.
- Bad but common: URLs designed as a result of code layout, like
/pim-module/show_person.php?id=237
- I don't care about: module layout, PHP, who #237 is, that “id” is a DB column or key in your code.
- … and I already know I'm trying to “show” info (because it was a GET request).
- Users (of a PIM) care about people and the properties they have.
- e.g.
/people/
(list of people)/people/greg-baker
(my info)/people/greg-baker/phones
(my phone numbers)
- Bad but common: URLs designed as a result of code layout, like
- URLs should be “hackable”.
- For example, if looking at a course web page
/cc/470/ggbaker/
, you should be able to find other offerings at/cc/470/
and another instructor's offering at/cc/470/apike/
. - URLs should be guessable, editable, type-able.
- … should provide context for the user.
- But hierarchies don't have to be deep.
- e.g. Wikipedia has basically two levels: language and article.
- For example, if looking at a course web page
Implementing Good URLs
- Fundamentally, all technologies have some equivalent of the CGI
PATH_INFO
variable.- e.g.
/app/people/you
could be handled by/app/index.cgi
withPATH_INFO == '/people/you'
. - After that, it's just a simple programming problem to get the right content out.
- e.g.
- mod_rewrite allows URL to URL mapping.
- … but it's ugly.
- It solves a problem that shouldn't exist in the first place.
- Modern frameworks allow arbitrary URL construction with the dispatcher/controller.
- e.g. incoming URLs are matched against a series of regular expressions; each rule corresponds to a controller function.
- Give some thought to URLs as you're constructing them (in
urls.py
,routes.rb
, etc.). - Maybe the default routes aren't what you want. Give it some thought as you're starting.
- Leave your URLs flexible.
- Don't hard-code URLs in templates: makes it very hard to change during development.
- Use the functions your framework provides to automatically determine the URL for a controller.
- Makes it much easier to “design” URLs during development.
- e.g. in a Rails template:
<%= link_to 'Edit', {:controller=>'people', :action=>'edit'} %>
- e.g. in Django view code:
return HttpResponseRedirect(reverse('people.views.show', kwargs={'person': p.slug}))
- Flexible URLs mean you can work on features for your site with temporary URLs.
- …but redesign the URLs once you can see how things work and how the features fit togethers.
- Do the redesign before you put code in production and the URLs become public: after that you're stuck with them.
- Even if you have to change URL after release, it's usually easy to leave a redirect in their place.
- e.g. Django generic view
redirect_to
used inurls.py
.
- e.g. Django generic view
- Use slugs instead of artificial keys in URLs where possible…
Slugs
- [“Slug” is apparently an old newspaper term for the short names that would be used to identify a story as it was being written.]
- For us, a slug is a short string that (1) identifies a particular piece of content so we can find it, (2) is safe to use in a URL, and (3) is at least a little meaningful.
- For example, a blog post with the title “Man Bites Dog!” might get the slug “man-bites-dog”. Another story with the same title might get “man-bites-dog-2”.
- If we have these slugs for each story (stored in an indexed field in the database), then we can use URLs like
/posts/man-bites-dog
, and display the right story. - … instead of using the primary key and getting a URL like
/posts/621
or even worse,/display_post.asp?id=621
. - All we have to do in code is look up by the slug instead of primary key.
- Maybe
/posts/2013/04/man-bites-dog
would be better: less chance of collision; even more useful info in URL. Just look up by slug and year and month.
- If we have these slugs for each story (stored in an indexed field in the database), then we can use URLs like
- Building a slug is something that can be done by code, not the author.
- You'll find a slug function in any framework, or a plugin module.
- If you're lucky, you just have to express “use the title field and build a slug unique for that year+month.”
- But be careful: does the framework code handle collisions? Check that.
- Try to find a field/fields that are usually unique: first+last name,
- Or maybe you already have something like a userid that is definitely unique.
- One of the things functions that build slugs usually do is strip non-letter/number characters.
- Would turn “Hello!!!☺” into “hello” (good), but “欢迎到我的博客” into “” (bad).
- There is a project called Unidecode that is basically a huge dictionary of Unicode characters to phonetic transcriptions.
- e.g.
slugify(unidecode(u"欢迎到我的博客")) == "huan-ying-dao-wo-de-bo-ke"
- Maybe not perfect, but better than “” or a database key (or making the user do it).
- Creates a not-bad URL:
/posts/huan-ying-dao-wo-de-bo-ke
.
REST
- REST, short for REpresentational State Transfer
- A style/schema/convention for interaction with a web system.
- Not a specific technology or standard.
- Allows another system to interact with your app's data: it's an API.
- … but does so over HTTP and alongside your regular web frontend.
- Basic ideas:
- URLs identify resources. (≈ nouns, e.g. Greg's schedule.)
- content types (formats + media types) represent resources. (e.g. an iCalendar-format calendar with media type
text/calendar
.) - Representations are exchanged (transferred) by HTTP methods (≈ verbs,
GET
,POST
,PUT
,DELETE
) - The server keeps track of the state of the data (probably with a database).
- URLs can represent both single items and collections.
/people/
: collection of all people./people/greg-baker
: one person's info.
- The HTTP methods map onto the CRUD (Create, Retrieve, Update, Delete) operations:
Op Method Single item Collection R GET
retrieve representation retrieve list of members U PUT
update item with info from provided representation replace collection with new members C POST
create new sub-info new entry in collection D DELETE
delete item delete collection - The web app should respond to all four methods on a URL and take appropriate actions.
- e.g.
PUT
request with content-typetext/calendar
: replace old calendar data with this calendar. - e.g.
POST
request with content-typetext/calendar
: add a meeting to calendar. - e.g.
GET
request withAccept: text/calendar
: return calendar as iCalendar. - e.g.
GET
request withAccept: text/html
: return calendar as HTML page.
- e.g.
- Choice of representation:
- HTML for the browser
GET
(which sendsAccept: text/html
anyway). - A standard type if one exists (e.g. vCard for contact info, GPX for GPS data)
- JSON in most other cases.
- HTML is probably read-only, but the others could be taken for update/create operations as well.
- HTML for the browser
- Another example: a blog with REST interface.
- Each blog post will have its own URL like
/posts/2013/about-rest
. - Collections of posts will have URLs:
/posts/
for all posts, and maybe/posts/2013/
or/posts/2013/05/
for some date-based subsets. - Comments might also have their own URLs:
/posts/2013/about-rest/comments/
,/posts/2013/about-rest/comments/47
- When a browser requests a post (GET
/posts/2013/about-rest
with HTTP headerAccept: text/html,…
) it will get the HTML representation it wants. - If another tool requests with different headers (like
Accept: application/json
), it can get a machine-readable version for processing. - An author can create a new blog entry by POSTing to
/posts/
with a JSON representation of the content (and HTTP request headerContent-Type: application/json
). - An author can edit a post by PUTting a JSON representation to
/posts/2013/about-rest
. - The site's mobile app could post a user's comment by POSTing its JSON content to
/posts/2013/about-rest/comments/
(withContent-Type: application/json
). - The web front end could let users comment with a form
<form action="/posts/2013/about-rest/comments/" method="post">
. It will be sent withContent-Type: application/x-www-form-urlencoded
ormultipart/form-data
. - ⋮ and so on.
- [Of course, those should have authentication/authorization checks done as appropriate.]
- Each blog post will have its own URL like
- REST APIs usually make richer use of the HTTP status codes than a simple web site. They can be used to give very useful information about the results. For example:
Action HTTP Response POST to create new item HTTP/1.1 201 Created
Location: http://example.com/items/the-new-itemPUT to update item HTTP/1.1 204 No Content
DELETE to remove something undeleteable HTTP/1.1 405 Method Not Allowed
Allow: OPTIONS, GET, HEAD, POSTPUT with XML data where JSON was expected HTTP/1.1 415 Unsupported Media Type
PUT with malformed JSON HTTP/1.1 400 Bad Request
POST to create new item with last-modified date in the future HTTP/1.1 422 Unprocessable Entity
POST to do some big calculation in the background HTTP/1.1 202 Accepted
Location: http://example.com/jobs/12345
Content-type: application/json
{"job-id": 12345, "time-estimate": 100} - Authenticating REST: an API can't have a login procedure like a user-facing website. There are various options to authenticate users.
- A fixed token in the request:
PUT /items/1234?token=62b3e2bb920dd8 HTTP/1.1
Content-type: application/json
{ … } - Something in the message body:
PUT /items/1234 HTTP/1.1
Content-type: application/json
{ "API-token": "62b3e2bb920dd8", … } - Best option: A key that must be calculated with each request: (like Amazon S3)
PUT /items/1234 HTTP/1.1
Authorization: 62b3e2bb920dd8
Content-type: application/json
{ … } - Of course, with all of these you must also check authorization: that the authenticated user is actually allowed to take this action.
- A fixed token in the request:
- Put this together with good URL design:
- All of these:
/pim-module/show_person.php?id=123
/pim-module/get_vcard.php?id=123
/pim-module/update.php?id=123
/pim-module/update_json.php?id=123
- become
/people/greg-baker
with different methods/parameters.
- All of these:
- It's less clear how to do REST for non-CRUD operations.
- For example, transferring money between bank accounts. Some ways you might represent a $100 transfer from account #1234 to #5678:
PUT /accounts/1234/transfer-to/5678/amount/100 HTTP/1.1
PUT /accounts/1234 HTTP/1.1 Content-type: application/json; charset=utf-8 {"action": "transfer", "destination-account": 5678, "amount": 100}
PUT /accounts/5678 HTTP/1.1 Content-type: application/json; charset=utf-8 {"action": "transfer", "source-account": 1234, "amount": 100}
POST /accounts/1234/transfers/ HTTP/1.1 Content-type: application/json; charset=utf-8 {"destination-account": 5678, "amount": 100}
- The Masse book suggests a verb as the last component of the URL (if it's not one of the CRUD operations):
/accounts/1234/transfer /alerts/1234/resend
- For example, transferring money between bank accounts. Some ways you might represent a $100 transfer from account #1234 to #5678:
- REST has replaced other HTTP API technologies (like SOAP) because it's simpler.
- Uses technologies we already understand.
- Simple to work with (since it uses HTTP, many existing tools work).
- Can sit alongside your regular frontend.
- There are REST helper libraries for most frameworks.
- These take care of some of the common tasks, so you don't have to.
- Be careful that the “helper” doesn't cause more work than it saves. Doing basic REST stuff might be as simple as:
def item(request, slug): item = get_object_or_404(Item, slug=slug) if request.method == 'GET': … elif request.method == 'POST': … else: return MethodNotAllowedResponse()
- But there are things like serializing objects, and parsing the
Accept
header that a library can help with.
- Should you really create an API for your web app?
- Yes, I really think you should.
- Lets others interact with your data (in ways you can safely allow).
- e.g. you don't have a mobile app, but somebody really wants one. They can create it.
- e.g. Google+ doesn't have a full API, so can't be updated from HootSuite, etc. (It's read-only.) Limited ways it could be used.
- Allows users to form a community and interact with your site in ways you didn't expect, but they need.
- As long as your API is secure, that's a good thing.