The Web and HTTP

As we said in the last topic, HTTP is the protocol that is used for the World Wide Web. Whenever your browser requests a page, it does it by contacting a web server and making the request with HTTP. The server then responds by sending the page, again with HTTP.

We aren't going to cover the details of an HTTP conversation, but some of the ideas are important for anyone creating web pages (or even using the web).

Requesting Pages

When you click on a link in your web browser, you are making a request for a web page. How does this request get fulfilled?

Your web browser on your computer is acting as the client for this request. That is, it is the one initiating the request. When creating web pages, we usually think of the client (or user agent) being a web browser, but it is important to remember that other tools can act as clients as well. Search engines like Google request pages so they can be indexed; other tools (like the HTML validator that we'll see later) request pages so they can be examined on the user's behalf; archivers and other tools download pages so they can be stored.

The client contacts the server to make its request. The server is another computer on the Internet: it is running server software that can answer HTTP requests. When it gets the request for a page, it finds or generates the content and sends it back.

“HTTP” defines the way that this conversation takes place between the two computers. It is used to request HTML pages, images, and any other content that is on the web. It also defines various errors that can be handled by the user agent or shown to the user.

URLs

A URL (or Uniform Resource Locator, ) is used by a web browser (or other tool) to find (locate) a page or other content (a resource) on the web. You may also see these called a URI or Uniform Resource Identifier (which isn't quite synonymous, but close enough for us).

URLs look like “http://www.w3.org/html/”. Here are the basic parts:

The scheme indicates the protocol that will be used to fetch the resource (it is also called the URL protocol). This URL will be fetched using HTTP.

You will also see URLs that start with “https://” indicating the HTTPS (HTTP Secure) protocol which works the same as HTTP, but all information is encrypted when being sent between the server and client. That ensure that none of the computers in between can read the messages going back-and-forth.

Next is the server (or host name). This is the name of the computer that the browser needs to contact to retrieve this information. The server software on it will answer the request.

Finally, the path indicates which page on this server we're interested in. This is used by the server to decide which piece of content the user has requested so it can be sent back (or it can decide that it can't find that path and return a 404 Not Found error message, or some other error if appropriate).