Intro

When doing web development, you can never ignore security.
- Because it's on the web, your code is open to attack by anyone, good or bad.
- 99% secure is failure.
Even on an intranet site.
- Maybe you have some stupid employees.
- Maybe someone gets into the network through another hole and can see your system.
There are several specific mistakes you can make that are worth knowing…

Basics: Input and Output

As a basic rule: remember that you can't trust any input, but have to carefully control your output.
You have to think of all input as dangerous.
- Maybe less important for desktop apps: if user does something stupid, it probably only affects their computer.
- For a web app, it will affect your server/database.
- On every piece of input you ever get from the outside world, you must assume that it can be broken/missing/malformed/malicious in every possible way all the time.
- Assume all users are trying to attack/break your site.
- … even for intranet sites.
For example, suppose you have a dropdown select in your HTML:
```
<select name="foo">
<option value="1">Option One</option>
<option value="2">Option Two</option>
<option value="3">Option Three</option>
</select>
```
- So the input from the user (in the for a GET request, message body for POST) will be something like “foo=2”.
- It's easy to think there are only three possible values returned.
- But user could easily hand-modify the URL/request.
- … or the HTML with Firebug.
- A third-party site could submit different data to your URL.
- Actual value you get could be missing, “foo=4”, 100kB of garbage data, …
Make sure you check all of your input carefully.
- Is it even there?
- Is it in the format you expect/one of the legal values?
- Is it coming from the user you expect/allow to submit?
- Will the text fit into your DB field?
- ⋮
Remember where output is being sent and escape accordingly.
- What special characters can you not use, or only use if they are escaped somehow?
- e.g. to display “<” in HTML, you must output <.
- Look for a library function that handles all of the escaping for your situation: HTML, SQL, URLs, ….

SQL Injection

General cause: building SQL queries from strings (like user input). (SQL注入攻击)
SQL injection is a special case of code injection (代碼注入).
e.g. your code does something like this:
id = request['id'] query = "SELECT * FROM posts WHERE id=" + id
- What if id came from the HTTP request and was sent as:
  0; DROP DATABASE foo
- You have allowed anybody to run arbitrary queries on your database.
Attack could be more subtle than a data-destroying attack, and just quoting doesn't fix it:
query = "SELECT * FROM users WHERE userid='"+user+"' AND password='"+pw+"'"
- User manages to get pw to be:
  ' or ''='
- Then they can log in as anyone, and you might never notice.
All input included in queries must be properly escaped.
- Check your DB library for a function: you'll probably miss something if you do it yourself.
  - But you'll probably only remember to wrap strings in the escaping function 99% of the time.
  - … and that's not enough.
- Better: use a function to automatically build queries and let it do the escaping.
  - Makes your code something like
    build_query(template="SELECT * FROM users WHERE userid=? AND password=?", args=[user, pw])
  - But check to make sure it properly escapes, and doesn't just substitute the strings.
- Best: use an ORM. They handle all of the SQL for you.

Cross-Site Scripting

Cross-Site Scripting, also known as XSS (跨網站指令碼).
Basic problem caused by displaying user-created content.
- e.g. user enters blog comment; your system outputs it in HTML for other users.
- But content could contain something other than the “plain text” you expected.
If you're lucky, just ugly formatting code you don't want.
- e.g. “I am <font size="100">very</font> happy!”
But if that's possible, so is the <script> tag.
- Then other users will be tricked into running that JS code when they view the comment.
- Could make requests back to the server like “delete account”.
- Very bad.
- Javascript can occur in many places in HTML, so be very cautious.
  - e.g. URLs like javascript:….
Solution 1: All input must be just text; escape the output.
- If you output “I am <font…”, no problem.
- Every language/framework will have a function to escape for HTML output.
- But you have to remember to use it everywhere..
- Some templating systems escape by default: excellent. [My opinion: making you escape by default is a bug.]
Solution 2: Allow user-formatting, but not with HTML.
- HTML is too unrestricted: hard to allow some formatting, but not all and <script>.
- Use some other lightweight markup language: Textile, Markdown, Wikitext, BBCode.
- These are built to allow some formatting but are much more restrictive.
- But be careful: some libraries for those allow HTML injection because they don't escape “<” in the output. Test to be sure.
- Some alternative lightweight markup languages; Textile, Markdown, Creole, Other Wikitext, BBCode, reStructuredText
- Textile home page: can you do a code injection on it?
- HTML Sanitizers: Bleach, HTML Purifier
Solution 3: Let user enter HTML, but check it to make sure it's okay.
- Very hard to do completely: don't try to do it yourself.
- Use an HTML sanitizing library.
- Often necessary if you want a WYSIWYG input (like TinyMCE).
But none of those solutions (necessarily) solve the “my home page is javascript:…” problem.
- Some of the tools do, but only within the text they process.
- If you provide a field where the user can enter a URL, you have to check for that.
- If the URL starts with “http://” or “https://” and is a legal URL, it should be okay.
- I think.
- See also clickjacking (点击劫持).
- Clickjacking [enwiki]
- 点击劫持 [zhwiki]

Insufficient Authorization

(授权不足)
It's easy to think that URLs are hard to find.
- e.g. the person accessing this URL must have gotten a link to it, so must be allowed to see it.
  http://…/item/12345
- But someone looking at item 12346 could easily guess that the above URL has some info.
- … or someone else could find the link in log files or on a shared computer.
Must check each request to make sure the requester is actually allowed to see the data you're about to send.
- Don't assume that because they have the link they are allowed.
- This is extremely common in student code I have seen.
Don't confuse authentication (认证, who is this person?) with authorization (授权, is this person allowed to see/do this?) checking.
- Just because they have logged in doesn't mean they can perform a particular action.
Often forgotten for secondary content.
- Watch for things you don't expect to be accessed directly.
- e.g. images, AJAX requests, popup “more info”, “hidden” admin interfaces, …
- These must be checked as carefully as any other request.
Easy to check: every controller/view should start with an authorization check.
Also be aware of smaller “information leaks”.
- e.g. confidential information should be displayed, but only to some users. Easy to forget about a small piece on some deeply linked page.
- Remember to think about parts of a page, not just whole URLs.

Insecure Uploads/Static Content

Problem: system allows file upload (e.g. avatar image).
- Files are stored in a directory and then served as static content.
- Problem 1: no authorization checks
- Problem 2: What if the user uploads a .php file? Server might execute it by default.
- This is often default behaviour for framework's file upload functionality: be careful.
Solution 1: don't serve uploaded content as static.
- Create a wrapper controller/view that sends the appropriate media type and file contents.
- Can check authorization as well.
- Requires processing overhead.
- Remember to not store it in the web root: your checks aren't much good if they can be bypassed by entering a URL like http://server/uploads/foo.jpg.
Solution 2: be very careful with the file and server config.
- Make sure there's no way to get code in and executed.
- Maybe easy for images: if it has a file extension you recognize and all of those types are served as static content, you should be okay. Maybe also run through an image processing library.
For example: Facebook, Renren content distribution networks do no authentication/authorization checks on images, but URLs are at least hard to guess.

Session Hijacking/Prediction

Web apps look at some information (a session token, 会话令牌) to determine that a user is authenticated.
- Token is usually stored/transmitted in a cookie.
- But what if someone can fake the token?
Problem 1: Prediction. (会话预测)
- Happens if the token can be guessed.
- e.g. a very bad cookie: user=ggbaker
- Attacker can easily send a request with that cookie.
- The token should be arbitrary and hard to guess.
  - e.g. a randomly-generated value: sessionid=4f153ee1c3b3
  - … then associate that with a user in your database.
  - Don't confuse a hash function with a random value.
Problem 2: Hijacking. (会话劫持)
- Happens if the token can be found somewhere and copied.
- e.g. in referrer logs, server logs, proxy logs, sniffed
- Hard to prevent entirely.
  - At least have sessions expire after some reasonable time.
- - HTTPS is more secure, so why isn’t the Web using it?
  - How much of a performance hit for HTTPS vs HTTP?
  Real answer is HTTPS for encryption and certification.

Cross-Site Request Forgery

or CSRF (跨站请求伪造).
The problem:
- A user is logged into your site and has a valid session cookie.
- The user visits another site that includes something like:
  <img src="http://yoursite/delete?id=12345">
- … or could be a POST request done with Javascript.
- User's browser makes the request to your side with their session cookie.
- Since they are properly authenticated, your site does the action on their behalf.
Attacker could be a malicious site, or code injected into another site with a XSS attack.
Very common problem, not well understood by developers.
Information stealing is also possible.
- … by sending response's contents back to the malicious server.
Prevention:
- Can check request's Referer header to make sure it came from your site.
  - … but that won't be enough if the request is generated by XSS-injected JS code.
- Require a secret token in each request.
  <input type="hidden" name="csrf_secret" value="4f153ee1c3b3">
  - … where the value is randomly generated for each page.
  - If submission doesn't have that secret, fail.
- Most frameworks have CSRF protection, but often off by default. Turn it on.

Insecure Data Storage

How if your data stored on the server? Who has access?
- OWASP: Insecure Cryptographic Storage
- Cache on delivery: mining memcached for private info
- You're Probably Storing Passwords Incorrectly
- How To Safely Store A Password
- How I became a password cracker
- Obviously minimize the people who have free access to the database, server, backups, ….
Can take steps to limit data loss, even if an intruder gets in.
- Critical data (credit card numbers, etc) can be stored separately on a second tier of secured storage.
- That way, if somebody gets access to your database server (through an SQL injetion or something), they can get some data (usernames, emails) but not the really important stuff.
Important case: passwords.
- Never store passwords. You don't need them.
- Only store a hash of a password. Run the password through a hash function and store the result; when user enters hash that and see if they match.
- Cryptographic hash function: function that maps a bitstring to a hash value so that (1) collisions are rare, and (2) reversing (calculating hash to bitstring) it is very hard.
- That way, if somebody gets access to your DB, they can only see the hashes and don't know what to type to log in as a user.
But reversing a hash function might not be that hard. (dictionary attacks, rainbow tables)
- So don't store hash(password) since password might be in a precomputed dictionary of hash values.
- Pick a salt: an arbitrary string that you can reproduce, possibly different for each user.
- Then store hash(salt+password).
- Much more resistant to dictionary attacks and rainbow tables (if different for each user).
Bigger problem: most hash functions can be computed quickly: that's their job. (MD5, SHA1, SHA3)
- If they can be computed quickly, an attacker can just try millions of possibilities until they find some that work.
- Solution: use a hash function that's hard to compute.
- e.g. bcrypt.

Example Security Breaches

Electronic Arts “Gives Away” Thousands Of Free Games Due To No Server-Side Validation (insufficient authn/authz)
SQL Injection Attack happening at the moment
Passport applicant finds massive privacy breach (insufficient authorization)
How I hacked Digg (XSS)
Abusing HTTP Status Codes to Expose Private Information (data leak)
100 Million Usernames, Passwords Leaked (cleartext passwords stored)
LinkedIn Password Leak: Salt Their Hide (unsalted passwords stored)