The Web Design Group

What's New in XHTML 1.0

XHTML 1.0 is the first HTML version based on XML instead of SGML. XML is more restrictive in its syntax rules and these are carried over into XHTML. There are no changes in the tags or attributes from HTML 4.01. Only syntax has changed, not the language or semanitcs.

Lower-case elements and attributes

In XML, both element and attribute names are case-sentitive. In the XHTML spec, they must all be specified in lower-case.

For example, this would be valid in HTML 4.01:

<A HREF="New.html">What's new?</A>

In XHTML 1.0, this must be changed to:

<a href="New.html">What's new?</a>

Note that attribute values may still be in upper-case and can be case-sensitive.

Pre-defined attribute values, such as the input element's type attribute must now be specified in lower-case.

All elements must be closed

In earlier HTML versions, some elements like td and option could be closed implicitly. In XHTML, all tags must be closed. This is another syntax restriction inherited from XML.

For example, while this was valid HTML,

<p>Grocery list:
<ul><li>milk<li>butter<li>eggs</ul>

in XHTML, it must be changed to

<p>Grocery list:</p>
<ul><li>milk</li><li>butter</li><li>eggs</li></ul>

Empty elements must also be closed. This can be done in two ways. For example, the br tag can appear as either

<br></br>

or

<br />

The second version is preferred, since some browsers are confused by the empty tag being closed.

Attribute Values must be Quoted

In earlier versions of HTML, quotes were optional for attribute values in some cases. In XML (and thus XHTML), they must be quoted. So, while this was legal HTML:

<input type=text value="default text">

...in XHTML, it must be written like this:

<input type="text" value="default text" />

Attribute Minimization not allowed

There are several boolean attributes in HTML and XHTML that are either on or off. In HTML, the name of the entity could be mentioned alone to turn it on:

<input type="checkbox" checked>

In XHTML, the name must be repeated as the value:

<input type="checkbox" checked="checked" />

Script and Style Content Interpteted Differently

The characters < and & are now interpreted as if they start markup in embedded script and style content. They must be quoted as entities: &lt; and &amp;

It is also possible to embed unescaped content in script and style like this:

<script type="text/javascript">
<![CDATA[
  document.write("<<<");
]]>
</script>

You cannot use ]]> in your script or style sheet in this case since it marks the end of the CDATA section or -- since it marks an XML comment.

This can be avoided altogether by using external scripts and style sheets.

"Hiding" scripts and style sheets from older browsers with HTML comments, <!-- ... -->, is no longer recommended. Use external scripts and style sheets instead.

The id and name attributes

Both the name and id can be used to specify a fragment. In XHTML, the id attribute must be used. Authors can also use name for backwards compatibility.

So, this HTML:

<h2><a name="example">Examples</a></h2>

should become:

<h2><a name="example" id="example">Examples</a></h2>

or simply:

<h2 id="example">Examples</h2>

MIME Type

An XHTML document is typically served with the MIME type application/xhtml+xml. Since many browsers don't recognize this new type, it can also be sent with the text/html type. If this is done, some effort should be made to make sure the code is compatable with older browsers.

Top-level tags required

In HTML, the html, head, and body tags could be omitted and their positions implied. In XHTML, they must all be specified explicitly.