Intro
- When doing web development, you can never ignore security.
- Because it's on the web, your code is open to attack by anyone, good or bad.
- 99% secure is failure.
- Even on an intranet site.
- Maybe you have some stupid employees.
- Maybe someone gets into the network through another hole and can see your system.
- There are several specific mistakes you can make that are worth knowing…
Basics: Input and Output
- As a basic rule: remember that you can't trust any input, but have to carefully control your output.
- You have to think of all input as dangerous.
- Maybe less important for desktop apps: if user does something stupid, it probably only affects their computer.
- For a web app, it will affect your server/database.
- On every piece of input you ever get from the outside world, you must assume that it can be broken/missing/malformed/malicious in every possible way all the time.
- Assume all users are trying to attack/break your site.
- … even for intranet sites.
- For example, suppose you have a dropdown select in your HTML:
<select name="foo"> <option value="1">Option One</option> <option value="2">Option Two</option> <option value="3">Option Three</option> </select>
- So the input from the user (in the for a GET request, message body for POST) will be something like “
foo=2
”. - It's easy to think there are only three possible values returned.
- But user could easily hand-modify the URL/request.
- … or the HTML with Firebug.
- A third-party site could submit different data to your URL.
- Actual value you get could be missing, “
foo=4
”, 100kB of garbage data, …
- So the input from the user (in the for a GET request, message body for POST) will be something like “
- Make sure you check all of your input carefully.
- Is it even there?
- Is it in the format you expect/one of the legal values?
- Is it coming from the user you expect/allow to submit?
- Will the text fit into your DB field?
- ⋮
- Remember where output is being sent and escape accordingly.
- What special characters can you not use, or only use if they are escaped somehow?
- e.g. to display “<” in HTML, you must output
<
. - Look for a library function that handles all of the escaping for your situation: HTML, SQL, URLs, ….
SQL Injection
- General cause: building SQL queries from strings (like user input). (SQL注入攻击)
- SQL injection is a special case of code injection (代碼注入).
-
e.g. your code does something like this:
id = request['id']
query = "SELECT * FROM posts WHERE id=" + id- What if
id
came from the HTTP request and was sent as:0; DROP DATABASE foo
- You have allowed anybody to run arbitrary queries on your database.
- What if
- Attack could be more subtle than a data-destroying attack, and just quoting doesn't fix it:
query = "SELECT * FROM users WHERE userid='"+user+"' AND password='"+pw+"'"
-
User manages to get
pw
to be:' or ''='
- Then they can log in as anyone, and you might never notice.
-
User manages to get
- All input included in queries must be properly escaped.
- Check your DB library for a function: you'll probably miss something if you do it yourself.
- But you'll probably only remember to wrap strings in the escaping function 99% of the time.
- … and that's not enough.
- Better: use a function to automatically build queries and let it do the escaping.
- Makes your code something like
build_query(template="SELECT * FROM users WHERE userid=? AND password=?",
args=[user, pw]) - But check to make sure it properly escapes, and doesn't just substitute the strings.
- Makes your code something like
- Best: use an ORM. They handle all of the SQL for you.
- Check your DB library for a function: you'll probably miss something if you do it yourself.
Cross-Site Scripting
- Cross-Site Scripting, also known as XSS (跨網站指令碼).
- Basic problem caused by displaying user-created content.
- e.g. user enters blog comment; your system outputs it in HTML for other users.
- But content could contain something other than the “plain text” you expected.
- If you're lucky, just ugly formatting code you don't want.
- e.g. “I am <font size="100">very</font> happy!”
- But if that's possible, so is the
<script>
tag.- Then other users will be tricked into running that JS code when they view the comment.
- Could make requests back to the server like “delete account”.
- Very bad.
- Javascript can occur in many places in HTML, so be very cautious.
- e.g. URLs like
javascript:…
.
- e.g. URLs like
- Solution 1: All input must be just text; escape the output.
- If you output “
I am <font…
”, no problem. - Every language/framework will have a function to escape for HTML output.
- But you have to remember to use it everywhere..
- Some templating systems escape by default: excellent. [My opinion: making you escape by default is a bug.]
- If you output “
- Solution 2: Allow user-formatting, but not with HTML.
- HTML is too unrestricted: hard to allow some formatting, but not all and
<script>
. - Use some other lightweight markup language: Textile, Markdown, Wikitext, BBCode.
- These are built to allow some formatting but are much more restrictive.
- But be careful: some libraries for those allow HTML injection because they don't escape “<” in the output. Test to be sure.
- HTML is too unrestricted: hard to allow some formatting, but not all and
- Solution 3: Let user enter HTML, but check it to make sure it's okay.
- Very hard to do completely: don't try to do it yourself.
- Use an HTML sanitizing library.
- Often necessary if you want a WYSIWYG input (like TinyMCE).
- But none of those solutions (necessarily) solve the “my home page is
javascript:…
” problem.- Some of the tools do, but only within the text they process.
- If you provide a field where the user can enter a URL, you have to check for that.
- If the URL starts with “
http://
” or “https://
” and is a legal URL, it should be okay. - I think.
- See also clickjacking (点击劫持).
Insufficient Authorization
- (授权不足)
- It's easy to think that URLs are hard to find.
- e.g. the person accessing this URL must have gotten a link to it, so must be allowed to see it.
http://…/item/12345
- But someone looking at item 12346 could easily guess that the above URL has some info.
- … or someone else could find the link in log files or on a shared computer.
- e.g. the person accessing this URL must have gotten a link to it, so must be allowed to see it.
- Must check each request to make sure the requester is actually allowed to see the data you're about to send.
- Don't assume that because they have the link they are allowed.
- This is extremely common in student code I have seen.
- Don't confuse authentication (认证, who is this person?) with authorization (授权, is this person allowed to see/do this?) checking.
- Just because they have logged in doesn't mean they can perform a particular action.
- Often forgotten for secondary content.
- Watch for things you don't expect to be accessed directly.
- e.g. images, AJAX requests, popup “more info”, “hidden” admin interfaces, …
- These must be checked as carefully as any other request.
- Easy to check: every controller/view should start with an authorization check.
- Also be aware of smaller “information leaks”.
- e.g. confidential information should be displayed, but only to some users. Easy to forget about a small piece on some deeply linked page.
- Remember to think about parts of a page, not just whole URLs.
Insecure Uploads/Static Content
- Problem: system allows file upload (e.g. avatar image).
- Files are stored in a directory and then served as static content.
- Problem 1: no authorization checks
- Problem 2: What if the user uploads a
.php
file? Server might execute it by default. - This is often default behaviour for framework's file upload functionality: be careful.
- Solution 1: don't serve uploaded content as static.
- Create a wrapper controller/view that sends the appropriate media type and file contents.
- Can check authorization as well.
- Requires processing overhead.
- Remember to not store it in the web root: your checks aren't much good if they can be bypassed by entering a URL like
http://server/uploads/foo.jpg
.
- Solution 2: be very careful with the file and server config.
- Make sure there's no way to get code in and executed.
- Maybe easy for images: if it has a file extension you recognize and all of those types are served as static content, you should be okay. Maybe also run through an image processing library.
- For example: Facebook, Renren content distribution networks do no authentication/authorization checks on images, but URLs are at least hard to guess.
Session Hijacking/Prediction
- Web apps look at some information (a session token, 会话令牌) to determine that a user is authenticated.
- Token is usually stored/transmitted in a cookie.
- But what if someone can fake the token?
- Problem 1: Prediction. (会话预测)
- Happens if the token can be guessed.
- e.g. a very bad cookie:
user=ggbaker
- Attacker can easily send a request with that cookie.
- The token should be arbitrary and hard to guess.
- e.g. a randomly-generated value:
sessionid=4f153ee1c3b3
- … then associate that with a user in your database.
- Don't confuse a hash function with a random value.
- e.g. a randomly-generated value:
- Problem 2: Hijacking. (会话劫持)
- Happens if the token can be found somewhere and copied.
- e.g. in referrer logs, server logs, proxy logs, sniffed
- Hard to prevent entirely.
- At least have sessions expire after some reasonable time.
- Real answer is HTTPS for encryption and certification.
Cross-Site Request Forgery
- or CSRF (跨站请求伪造).
- The problem:
- A user is logged into your site and has a valid session cookie.
- The user visits another site that includes something like:
<img src="http://yoursite/delete?id=12345">
- … or could be a POST request done with Javascript.
- User's browser makes the request to your side with their session cookie.
- Since they are properly authenticated, your site does the action on their behalf.
- Attacker could be a malicious site, or code injected into another site with a XSS attack.
- Very common problem, not well understood by developers.
- Information stealing is also possible.
- … by sending response's contents back to the malicious server.
- Prevention:
- Can check request's
Referer
header to make sure it came from your site.- … but that won't be enough if the request is generated by XSS-injected JS code.
- Require a secret token in each request.
<input type="hidden" name="csrf_secret" value="4f153ee1c3b3">
- … where the value is randomly generated for each page.
- If submission doesn't have that secret, fail.
- Most frameworks have CSRF protection, but often off by default. Turn it on.
- Can check request's
Insecure Data Storage
- How if your data stored on the server? Who has access?
- Obviously minimize the people who have free access to the database, server, backups, ….
- Can take steps to limit data loss, even if an intruder gets in.
- Critical data (credit card numbers, etc) can be stored separately on a second tier of secured storage.
- That way, if somebody gets access to your database server (through an SQL injetion or something), they can get some data (usernames, emails) but not the really important stuff.
- Important case: passwords.
- Never store passwords. You don't need them.
- Only store a hash of a password. Run the password through a hash function and store the result; when user enters hash that and see if they match.
- Cryptographic hash function: function that maps a bitstring to a hash value so that (1) collisions are rare, and (2) reversing (calculating hash to bitstring) it is very hard.
- That way, if somebody gets access to your DB, they can only see the hashes and don't know what to type to log in as a user.
- But reversing a hash function might not be that hard. (dictionary attacks, rainbow tables)
- So don't store
hash(password)
sincepassword
might be in a precomputed dictionary of hash values. - Pick a salt: an arbitrary string that you can reproduce, possibly different for each user.
- Then store
hash(salt+password)
. - Much more resistant to dictionary attacks and rainbow tables (if different for each user).
- So don't store
- Bigger problem: most hash functions can be computed quickly: that's their job. (MD5, SHA1, SHA3)
- If they can be computed quickly, an attacker can just try millions of possibilities until they find some that work.
- Solution: use a hash function that's hard to compute.
- e.g. bcrypt.
Example Security Breaches
- Electronic Arts “Gives Away” Thousands Of Free Games Due To No Server-Side Validation (insufficient authn/authz)
- SQL Injection Attack happening at the moment
- Passport applicant finds massive privacy breach (insufficient authorization)
- How I hacked Digg (XSS)
- Abusing HTTP Status Codes to Expose Private Information (data leak)
- 100 Million Usernames, Passwords Leaked (cleartext passwords stored)
- LinkedIn Password Leak: Salt Their Hide (unsalted passwords stored)