Grammatical Description of Syntax: The BNF Metalanguage

Robert D. Cameron
January 7, 2002

The BNF (Backus Normal Form or Backus-Naur Form) metalanguage has been in widespread use for defining the syntax of programming languages and other computing notations since it was introduced in the late 1950s, during the development of Algol 60. Technically, BNF is a notation for defining context-free grammars, formal sets of rules for defining and structuring the sets of strings and phrases that are associated with a language.

Using BNF, the syntax of a language is defined through a set of production rules (or simply, productions), each of which defines one category of syntactic construct in the language. Categories are identified by symbols called nonterminal symbols, (or simply, nonterminals), consisting of a name for the category enclosed in angle brackets, as in <statement> or <expression>. Each production describes the structure of strings that may produced as valid instances of the category.

Syntactically, a BNF production consists of the nonterminal for the category being defined, followed by the metasymbol "::=" and the body of the definition. The body consists of a number of alternative forms for the construct separated by the metasymbol "|". Each such alternative form is a sequence of terminal and/or nonterminal symbols. A terminal symbol is a character string such as else or += which stands for itself as a part of the language being defined. A nonterminal symbol appearing in this context stands for any (and all) strings that can be generated by the corresponding production.

For example, the following production rule uses the terminal symbols if, then and else and the nonterminal symbols <expression> and <statement> to describe two possible forms for if-statements.

<if-statement> ::= if <expression> then <statement> |
  if <expression> then <statement> else <statement>

In the first alternative, an if-statement consists of the string if followed by any string produced by the rule for <expression>, the string then and any <statement> string. The second alternative shows that it is possible to extend the syntax defined by the first alternative by further adding the string else and another <statement> string.

An essential characteristic of BNF is that it allows grammars to be defined recursively. For example, the production above defines <if-statement> in terms of <statement>, which may in turn use <if-statement> in its definition.

<statement> ::= <assignment> | <if-statement> | <while-loop> | ...
This describes the notion that one if-statement can contain other if-statements nested arbitrarily deeply.

In fact, a given production may even refer to itself as in the following rule.

<statement-list> ::= <statement> | <statement> ; <statement-list>

This specifies that a statement-list is either a single statement or a statement followed by a ; followed by a statement-list. This recursive specification thus describes a statement-list as arbitrarily many statements separated by semicolons. This specification is said to be right-recursive because the recursively invoked nonterminal is the rightmost nonterminal in an alternative form. A statement-list having the same syntax could equally well be defined using the following left-recursive rule.

<statement-list> ::= <statement> | <statement-list> ; <statement>

EBNF: Extended BNF

In addition to the above facilities, which define pure BNF, two additional metasyntactic extensions are usually used for more concise descriptions, giving the metalanguage EBNF. The first of these is the use of square brackets to denote a phrase which is optional. For example, the rule given previously for if-statements could be recast as follows.

<if-statement> ::= if <expression> then <statement> [ else <statement> ]

The second device is to use braces to indicate a syntactic phrase that is to be repeated zero or more times. This device can often be used to eliminate simple forms of recursion as in the following definition for statement-lists.

<statement-list> ::= <statement> { ; <statement> }

These extensions and others that are sometimes encountered are really used for convenience and conciseness of notation, only. Their addition does not in fact change the set of grammars describable using BNF.

A final requirement of the BNF syntax is to deal with the case that the programming language being defined actually uses one of the metasybols (e.g., "|") in its syntax. Often, distinguished fonts are used for this purpose, typically with a monospace font chosen for terminal symbols. Alternatively, quotation marks may be used to indicate a character used for its syntactic rather than metasynactic meaning.

There are a variety of notational variations of BNF that often give syntax descriptions a slightly different appearance. Sometimes, for example, the angle brackets around nonterminals are eliminated and the type font is used to distinguish terminal and nonterminal symbols.