Semantic Markup

Whenever HTML tags have been described here, the focus has been on what they mean or their purpose on the page, not on how they look:

The <p>…</p> tag is used to enclose a paragraph on the page… The <a> tag is used to create links… The horizontal rule element is used to indicate a break in the content.

None of these descriptions have been about what these elements look like in a web browser, because that's not what HTML describes. HTML is a semantic markup language. That is, its job is to describe the semantics of content: what that content means, what kind of content is is, or what its purpose or role is on the page.

When creating HTML pages, our goal should be to describe the content as accurately as possible with the tags we have.

The alternative is visual markup or presentational markup where the author specifies the appearance of content directly.

You may be familiar with visual and semantic formatting in word processors like Microsoft Word, or Google Docs word processing documents, which allow both. Note that Word and Docs are not markup languages (since there is no markup to write), but their formatting style can still be described as visual or semantic.

MS Word tools for visual formatting (left) and semantic formatting (right)

Google Docs tools for semantic formatting (left, under “Normal text”) and visual formatting (right, on the toolbar)

In those screenshots, you can see visual formatting tools where you directly choose the font, style, colour, and so on. Both tools also have “style” tools where you can specify the type of content in a semantic way.

Why Use Semantic Markup?

The best reason for doing semantic markup is probably that it will let us use CSS effectively, as we will see in the unit on Stylesheets. CSS will let us make changes to the page like “all paragraphs should look like …”. If we used visual markup (to express something like “this text is on a new line, left justified, in a 12pt font” on each paragraph in our page) then we wouldn't be able to easily re-style everything: we would have to change each “paragraph” separately.

You may have had an experience like this using the tools described above for essays or other documents. If you format every section heading by selecting a font, then a size, then bold, they might all look fine. But if you then decide to change the style of your document, you have to change each one manually. This can be very time-consuming for a long document.

The word processor style tools let you change the way every “Heading 1” or “Subtitle” looks at one time by changing the defaults for that type of content. This takes the same amount of time for one or a thousand pages.

Using semantic markup in HTML also helps your page be used in situations you might not have expected. For example, search engines (Google, DuckDuckGo, Bing, etc.) must read pages on the Web to figure out what they are about (to decide what search terms they should be returned for). These are entirely automated tools that must process pages algorithmically.

If we have done a good job with our markup, the search engines can extract lots of meaning (semantics) just from our markup. For example, just from examining the headings in the “complete page” example from the last topic, we get this outline for the document:

HTML Basics
1. More to Learn
2. Not In This Course

That is a decent summary of the document, and it can be easily discovered automatically from the markup. More can be discovered by examining the other tags, if they have been used correctly, in ways that match their meaning.

How to Choose Tags

We will see in the unit on CSS ways to control the appearance of pages. We will be able to use CSS to give appearances to each type of content. For now, we will have to accept the browsers' default styles for each element.

Since HTML is a semantic markup language, we should be keeping the meaning of our content in mind as we're creating the pages. When marking up new types of content, have a look at the HTML reference to see if there is a tag that makes sense for that content.

Examples

Consider these sentences which we want to mark up appropriately in HTML:

Each sentence has one word/phrase that is formatted differently. When choosing markup for content, we have to include markup for the meaning of the content. That means we can't just settle for a tag that looks right.

In the first sentence, the work “you” is highlighted as an important word in the sentence. The emphasis is placed on that word and that is important to the meaning here.

Looking in the HTML reference, the most appropriate element I see is <em> which is described as indicating “text that has stress emphasis”, which sounds like what is happening here.

For the second sentence, the word “never” could also be wrapped in <em>, but again looking at the reference we can find the <strong> element which “gives text strong importance”. It is basically like <em> but with even more emphasis: that seems appropriate to a warning like this.

The text “<li>” in the third sentence is an example of HTML code. In the reference, we might look at <kbd>, which “represents user input” and could be appropriate if we're asking the user to type something, but that doesn't sound like what is happening here. A better fit would be <code> which “represents a fragment of computer code”.

After some looking at the available HTML tags and what they mean, we arrive at this markup for the first three sentences:

<p>Why do <em>you</em> think that is interesting?</p>
<p>You must <strong>never</strong> let that happen.</p>
<p>I used the <code>&lt;li&gt;</code> tag for each item.</p>

That will look something like this in the web browser:

Why do you think that is interesting?

You must never let that happen.

I used the <li> tag for each item.

That doesn't look exactly like the original sample, but that's not important here. The meaning is right and that's what we care about. We can worry about appearance later when we start working with Stylesheets.

We haven't suggested any markup for the last sentence, which has some text formatted differently because it's in a different language. By convention, non-English phrases are formatted in italics when printed.

If we look in the reference for a tag with a meaning like “phrase in another language”, we won't find one. What we need is a tag that we can use when nothing else fits…

The `class` and `id` Attributes

HTML has many usefu semantic tags that we can use to describe our content, but sometimes they can be limiting.

The <p> tag is nice to enclose paragraphs, but there can be times you have more than one type of paragraph, for example normal paragraphs containing prose and a paragraph of copyright information at the bottom of the page (“This page is copyright © 2015…”).

The class and id attributes can be used to give extra semantic information about elements. We can continue to use <p> for most paragraphs, but <p id="copyright"> for the one that is semantically different from the others.

The value of class and id can be any word, but should be meaningful (i.e. something about the meaning, not the appearance).

The different between the two is that a particular id value can be used only once on a page. So we can have at most one element with id="copyright". That seems appropriate in this case sicne we only have one copyright notice per page.

You might also use <footer> for the copyright information, depending on the context, but that's not a good class/id example.

The most likely reason to want to distinguish an element is to make it look different with CSS. We'll get to that later. For now we will use class and id just for semantic reasons.

As an example of this, in this guide there are many examples of computer code marked up as <code> elements, but I wanted some to look different. HTML code looks like “<p>” but filenames look like “filename.html”. In order to distinguish the two types of code, the markup used was <code class="html"> and <code class="file">. It is then possible to grab these things separately in CSS and change their appearance. We need to use class here, since there might be many HTML code examples on one page.

Generic Tags

In the above example, we came to a piece of content that we need to wrap in a tag because it is a different type of content, but find nothing matching its meaning in the HTML reference. This is inevitable: there is no way we could add HTML tags to match every kind of content that everyone ever uses.

There are two generic tags to handle these cases: <div> and <span>.

The difference between the two is where they can be placed in the HTML.

The <div> element holds block-level content (or sometimes flow content). That is, it can go directly inside the <body> and is displayed below the previous block. Other block-level elements we have seen include <p>, <h1>, <h2>, <ul>, and <li>.

The <span> elements holds inline content (or sometimes phrasing content). Inline content goes inside a block, so it part of a paragraph, heading, list item, or other block. Other inline elements include <em>, <a>, and <abbr>.

Because they don't have any meaning on their own, <div> and <span> should always be given a meaningful class or id value that indicates their purpose on the page.

We can now give markup for the last example above. We found no existing tag for a phrase in another language, so must choose a generic tag. The phrase is part of a paragraph, so we choose the inline tag <span> and give it a meaningful class name. (We shouldn't use id since there may be many other-language phrases on a single page.)

<p>Oh well, <span class="otherlang">c'est la vie</span>.</p>

When displayed on the page, the <span> looks the same as other text. We will need to use CSS to change its appearance once we start working with it.

Oh well, c'est la vie.

We can actually improve this markup a little more by using the lang attribute we saw earlier:

<p>Oh well, <span class="otherlang" lang="fr">c'est la vie</span>.</p>

This way, the text can be indexed more accurately by Google and your web browser may offer an automatic translation to English.

Why Use Semantic Markup?

How to Choose Tags

Examples

The class and id Attributes

Generic Tags

The `class` and `id` Attributes