The BNF (Backus Normal Form or Backus-Naur Form) metalanguage has been in widespread use for defining the syntax of programming languages and other computing notations since it was introduced in the late 1950s, during the development of Algol 60. Technically, BNF is a notation for defining context-free grammars, formal sets of rules for defining and structuring the sets of strings and phrases that are associated with a language.
Using BNF, the syntax of a language is defined through a set of production rules (or simply, productions), each of which defines one category of syntactic construct in the language. Categories are identified by symbols called nonterminal symbols, (or simply, nonterminals), consisting of a name for the category enclosed in angle brackets, as in <statement> or <expression>. Each production describes the structure of strings that may produced as valid instances of the category.
Syntactically, a BNF production consists of the nonterminal
for the category being defined, followed by the metasymbol "::=" and
the body of the definition.
The body consists of a
number of alternative
forms for the construct separated by the metasymbol "|".
Each such alternative form is a sequence
of terminal and/or nonterminal
symbols.
A terminal symbol is a character string such as else
or +=
which
stands for itself as a part of the language being defined.
A nonterminal symbol appearing in this context stands for any
(and all) strings that can be generated by the corresponding production.
For example, the following production rule
uses the terminal symbols if
, then
and else
and the
nonterminal symbols <expression> and <statement> to
describe two possible forms for if-statements.
<if-statement> | ::= |
if <expression> then <statement> | |
if <expression> then <statement> else <statement>
|
In the first alternative, an if-statement consists of
the string if
followed by
any string produced by the rule for <expression>, the string
then
and any <statement> string.
The second alternative shows that it is possible to extend the
syntax defined by the first alternative by further adding
the string else
and another <statement> string.
An essential characteristic of BNF is that it allows grammars to be defined recursively. For example, the production above defines <if-statement> in terms of <statement>, which may in turn use <if-statement> in its definition.
<statement> | ::= | <assignment> | <if-statement> | <while-loop> | ... |
In fact, a given production may even refer to itself as in the following rule.
<statement-list> | ::= |
<statement> | <statement> ; <statement-list> |
This specifies that a statement-list is either a single statement
or a statement followed by a ;
followed by a statement-list.
This recursive specification thus describes a statement-list as
arbitrarily many statements separated by semicolons.
This specification is said to be right-recursive because the
recursively invoked nonterminal is the rightmost nonterminal
in an alternative form. A statement-list having the same syntax
could equally well be defined using the following left-recursive
rule.
<statement-list> | ::= |
<statement> | <statement-list> ; <statement> |
In addition to the above facilities, which define pure BNF, two additional metasyntactic extensions are usually used for more concise descriptions, giving the metalanguage EBNF. The first of these is the use of square brackets to denote a phrase which is optional. For example, the rule given previously for if-statements could be recast as follows.
<if-statement> | ::= |
if <expression> then <statement> [ else <statement> ] |
The second device is to use braces to indicate a syntactic phrase that is to be repeated zero or more times. This device can often be used to eliminate simple forms of recursion as in the following definition for statement-lists.
<statement-list> | ::= | <statement> { ; <statement> } |
These extensions and others that are sometimes encountered are really used for convenience and conciseness of notation, only. Their addition does not in fact change the set of grammars describable using BNF.
A final requirement of the BNF syntax is to deal with the case that the programming language being defined actually uses one of the metasybols (e.g., "|") in its syntax. Often, distinguished fonts are used for this purpose, typically with a monospace font chosen for terminal symbols. Alternatively, quotation marks may be used to indicate a character used for its syntactic rather than metasynactic meaning.
There are a variety of notational variations of BNF that often give syntax descriptions a slightly different appearance. Sometimes, for example, the angle brackets around nonterminals are eliminated and the type font is used to distinguish terminal and nonterminal symbols.