BNF grammars are finite descriptions of infinite languages, i.e., languages with infinitely many possible strings. These grammars are often said to be generative grammars because they generate all strings of the language. The way in which a string of the language is generated by the grammar is called its derivation and can be illustrated by its derivation tree (also called parse tree or syntax tree).
A derivation tree is formed in an upside-down fashion, starting with a single nonterminal symbol at the top. One of the alternatives in the production rule for this nonterminal is chosen and placed below the initial nonterminal symbol; branches are then drawn connecting the initial nonterminal symbol to each terminal and nonterminal symbol in the alternative chosen. Each nonterminal symbol in the alternative is then expanded in a similar fashion. The process completes when only terminal symbols remain at the tips of all the branches.
Consider, for example, the statement list S1; S2; S3
.
Using the right-recursive production for statement-lists, the
derivation tree is as follows.
<statement-list> | ::= |
<statement> | <statement> ; <statement-list> |
<statement-list> | |||
/ | | | \ | |
<statement> | ; | <statement-list> | |
| | / | | | \ |
S1 | <statement> | ; | <statement-list> |
| | | | ||
S2 | <statement> | ||
| | |||
S3 |
Here S1
, S2
and S3
stand for statements
which would normally be
further expanded according to the productions describing their structure.
Alternatively, if the left-recursive rule for statement-lists were used, the derivation tree would be as follows.
<statement-list> | ::= |
<statement> | <statement-list> ; <statement> |
<statement-list> | |||
/ | | | \ | |
<statement-list> | ; | <statement> | |
/ | | | \ | | |
<statement-list> | ; | <statement> | S3 |
| | | | ||
<statement> | S2 | ||
| | |||
S1 |
A grammar is said to be ambiguous if there is more than one possible derivation for a string of the language. In itself, grammatical ambiguity may be harmless as the grammar can still be perfectly accurate in describing the language syntax. Because semantics is assigned to language constructs based on their derivation from the grammar, however, grammatical ambiguity may lead to an ambiguity in semantics, i.e., language constructs for which two conflicting meanings are defined. Ambiguity is a serious problem in programming language definition.
Unfortunately, there is no way of detecting ambiguity in general. There are, however, certain well-known types of ambiguity which can be watched for. An important case that often arises in the syntax of expressions is that of productions which are both left- and right-recursive; such productions are always ambiguous. Consider, for example, the following grammar.
<expression> | ::= |
<variable> |
( <expression> ) |
<expression> + <expression> | |
<expression> * <expression> | ||
<variable> | ::= | x | y | z |
x * y + z
, for which the following two derivation trees are possible.
<expression> | |||
/ | | | \ | |
<expression> | * | <expression> | |
| | / | | | \ |
<variable> | <expression> | + | <expression> |
| | | | | | |
x | <variable> | <variable> | |
| | | | ||
y | z |
<expression> | |||
/ | | | \ | |
<expression> | + | <expression> | |
/ | | | \ | | |
<expression> | * | <expression> | <variable> |
| | | | | | |
<variable> | <variable> | z | |
| | | | ||
x | y |
In the first of these derivations, the variables y
and z
are grouped
together in a subtree with the operator +
; the implied semantics is
equivalent to x * (y + z).
The second of these derivations has a different interpretation, however,
with the subtree for x * y
implying the interpretation (x * y) + z.
In general, ambiguities can be eliminated by adding new productions and re-arranging the grammar. The ambiguity, above, for example can be eliminated using the following equivalent grammar.
<expression> | ::= |
<element> |
<expression> + <element> | |
<expression> * <element> | ||
<element> | ::= | <variable> | ( <expression> )
|
<variable> | ::= | x | y | z |
x * y + z
has the following
unique derivation.
<expression> | |||
/ | | | \ | |
<expression> | + | <element> | |
/ | | | \ | | |
<expression> | * | <element> | <variable> |
| | | | | | |
<element> | <variable> | z | |
| | | | ||
<variable> | y | ||
| | |||
x |
Even when ambiguity
is removed from the grammar, it may not provide an appropriate basis for
language semantics.
Consider, for example,
the expression x + y * z
as described according
to our revised grammar.
The derivation for this expression groups x
and y
together for addition
before z
is used for multiplication, as follows.
<expression> | |||
/ | | | \ | |
<expression> | * | <element> | |
/ | | | \ | | |
<expression> | + | <element> | <variable> |
| | | | | | |
<element> | <variable> | z | |
| | | | ||
<variable> | y | ||
| | |||
x |
The implied semantics, then, is equivalent to (x + y) * z when the standard interpretation desired might be x + (y * z).
Re-arrangement of the grammar is again the basis for solving such problems. In this case, the grammar must be arranged to indicated the higher precedence of multiplication over addition. This may be achieved as follows.
<expression> | ::= | <term> | <expression> + <term> |
<term> | ::= | <element> | <term> * <element> |
<element> | ::= | <variable> | (
<expression> ) |
<variable> | ::= | x | y | z |
Using this grammar, the derivation tree for x + y * z
implies
the desired result.
<expression> | |||
/ | | | \ | |
<expression> | + | <term> | |
| | / | | | \ |
<term> | <term> | * | <element> |
| | | | | | |
<element> | <element> | <variable> | |
| | | | | | |
<variable> | <variable> | z | |
| | | | ||
x | y |
In general, then, language grammars must be designed carefully not only to describe the syntax of the language, but also to ensure that the structure of derivation trees allows proper semantic interpretations to be defined. Reading a grammar also requires care in order to understand its semantic implications.