XHTML 1.0 is the first HTML version based on XML instead of SGML. XML is more restrictive in its syntax rules and these are carried over into XHTML. There are no changes in the tags or attributes from HTML 4.01. Only syntax has changed, not the language or semanitcs.
In XML, both element and attribute names are case-sentitive. In the XHTML spec, they must all be specified in lower-case.
For example, this would be valid in HTML 4.01:
<A HREF="New.html">What's new?</A>
In XHTML 1.0, this must be changed to:
<a href="New.html">What's new?</a>
Note that attribute values may still be in upper-case and can be case-sensitive.
Pre-defined attribute values, such as the input element's type attribute must now be specified in lower-case.
In earlier HTML versions, some elements like td and option could be closed implicitly. In XHTML, all tags must be closed. This is another syntax restriction inherited from XML.
For example, while this was valid HTML,
<p>Grocery list:
<ul><li>milk<li>butter<li>eggs</ul>
in XHTML, it must be changed to
<p>Grocery list:</p>
<ul><li>milk</li><li>butter</li><li>eggs</li></ul>
Empty elements must also be closed. This can be done in two ways. For example, the br tag can appear as either
<br></br>
or
<br />
The second version is preferred, since some browsers are confused by the empty tag being closed.
In earlier versions of HTML, quotes were optional for attribute values in some cases. In XML (and thus XHTML), they must be quoted. So, while this was legal HTML:
<input type=text value="default text">
...in XHTML, it must be written like this:
<input type="text" value="default text" />
There are several boolean attributes in HTML and XHTML that are either on or off. In HTML, the name of the entity could be mentioned alone to turn it on:
<input type="checkbox" checked>
In XHTML, the name must be repeated as the value:
<input type="checkbox" checked="checked" />
The characters < and & are now interpreted as if they start markup in embedded script and style content. They must be quoted as entities: < and &
It is also possible to embed unescaped content in script and style like this:
<script type="text/javascript"> <![CDATA[ document.write("<<<"); ]]> </script>
You cannot use ]]> in your script or style sheet in this case since it marks the end of the CDATA section or -- since it marks an XML comment.
This can be avoided altogether by using external scripts and style sheets.
"Hiding" scripts and style sheets from older browsers with HTML comments, <!-- ... -->, is no longer recommended. Use external scripts and style sheets instead.
Both the name and id can be used to specify a fragment. In XHTML, the id attribute must be used. Authors can also use name for backwards compatibility.
So, this HTML:
<h2><a name="example">Examples</a></h2>
should become:
<h2><a name="example" id="example">Examples</a></h2>
or simply:
<h2 id="example">Examples</h2>
An XHTML document is typically served with the MIME type
application/xhtml+xml
. Since many
browsers don't recognize this new type, it can also be sent with
the text/html
type. If this is done, some
effort should be made to make sure the code is compatable with
older browsers.
In HTML, the html, head, and body tags could be omitted and their positions implied. In XHTML, they must all be specified explicitly.