Markup and Style: History and Philosophy

Robert D. Cameron
January 13, 2003

Markup Languages

A markup language is a notation system for describing documents in which the text of documents is intermixed with markup: annotations that describe aspects of the structure or format of a document. Markup languages are typically used to define the source documents for document processors; the document processor transforms the markup into a suitable output representation for printing or display.

Although there have been other markup languages in use over the years, the following five language families are of particular importance.

1. Runoff, nroff, troff

Starting with a simple system to run off a document on computing systems of the 1960s, troff (pronounced tee-roff, for typesetter roff) represents the culmination of a family of markup languages that have been widely used with Unix.

2. Scribe

The first widely used markup system based on the logical document model: markup should describe the logical structure of a document independent of its physical representation (a.k.a. declarative markup).

B. Reid, "Scribe: A Document Specification Language and its Compiler," Ph.D. Dissertation, Carnegie Mellon University, Pittsburgh, PA (October, 1980).

3. TeX/LaTeX

TeX is the pioneering effort of Donald E. Knuth in considering the application of mathematics to typography and typography to mathematics.

4. GML/SGML

Based on Scribe, Charles Goldfarb developed Generalized Markup Language as a declarative markup notation and subsequently lead the effort to develop SGML as standard widely used in the publishing industry.

5. HTML/XML

HTML is Hypertext Markup Language, based on the core syntax of SGML, but using a fixed set of markup elements based on hypertext presentation.

Early versions of HTML mixed logical and physical markup together.

HTML has evolved to emphasize declarative markup.

XML has evolved as an extensible markup language that may replace HTML.

The Logic of Declarative Markup

Declarative markup separates the physical formatting of a document from its logical structure. Why?