Definite Clause Grammars (Continued)

CMPT 383 Lecture Notes
Robert D. Cameron
April 2002

General Syntax of DCG Rules

The following syntax describes the structure of DCG rules (clauses).

<definite-clause>::= <LHS> --> <RHS> .
<LHS> ::= <non-terminal>
<nonterminal> ::= <identifier> ( <arg> { , <arg> } )
<RHS> ::= <RHS-item> { , <RHS-item> }
<RHS-item> ::= <nonterminal> | <terminal-list> | <logic>
<terminal-list> ::= [ <terminal> { , <terminal> } ]
<logic> ::= { <goal> }

The LHS of each definite clause is a nonterminal symbol, possibly parameterized. On the RHS, three kinds of item may be found: nonterminal symbols (possibly parameterized), lists of terminal symbols inside square brackets, or arbitrary logic goals within braces.

General Translation of DCG Rules to Parsing Predicates

To understand what parsing predicates are generated for particular DCG rules, we can describe the process of translating DCG rules to parsing predicates as follows. However, note that you do not have to do this translation at any point; Prolog automatically does it on recognition of the --> functor.

For each DCG rule, transformation to an equivalent parsing predicate may be performed as follows.

In such a transformations, L0 represents the input list being passed in for parsing, while Lj represents the input that remains unconsumed after the first j of the RHS terminals and nonterminals have been successfully parsed.

For example, consider the following DCG rule.

expr(Z) --> num(X), [+], expr(Y), {Z is X+Y}.
Using the procedures above, we can see that this translates to the following clause for expr/3.
expr(Z, L0, L3) :- 
  num(X, L0, L1), 'C'(L1, +, L2), expr(Y, L2, L3), {Z is X+Y}.

Parsing With Strings

Prolog interprets string constants within double quotes as list of character codes (integers).

For example, we can rework our numeric expression grammar to use strings. That is, instead of parsing a list of symbolic and numeric constants such as [100, +, 20, +, 3, +, 6], we want to parse the string expression "100+20+3+6".

expr(Z) --> num(Z).
expr(Z) --> num(X), "+", expr(Y), {Z is X+Y}.
expr(Z) --> num(X), "-", expr(Y), {Z is X-Y}.
num(Z) --> num(Z, 0).
num(Z, Accum) --> digit(D1), {Z is 10*Accum+D1}.
num(Z, Accum) --> digit(D1), {A2 is 10*Accum+D1}, num(Z, A2).
digit(Z) --> [D],   {"0" =< D, D =< "9", Z is D - "0"}.

expr_value(L, V) :- expr(V, L, []).
| ?- expr_value("301+20+3+5", S).

S = 329 ? 

| ?- expr_value("324-24+1", V).

V = 299 ? 
Prolog parses the strings and computes the values of the expressions as specified. But the last example illustrates once again the unusual right-associative interpretation of substraction that results from our grammar. How can this be changed to give the standard left-associative rule for subtraction, so that "324-24+1" evaluates to 301?

Left Recursion

Left-associativity of expressions is obtained by grammar rules that are left-recursive.

For example, if we want the interpretation of "324-24+1" to be (324-24)+1=301, we might try changing two of our DCG rules as follows.

expr(Z) --> num(Z).
expr(Z) --> expr(X), "+", num(Y), {Z is X+Y}.
expr(Z) --> expr(X), "-", num(Y), {Z is X-Y}.

Left-recursive grammars can be transformed to an acceptable DCG form for parsing, as shown by the following example.

expr(Z) --> num(X), etail(X, Z).
etail(Accum, Z) --> {Z is Accum}.
etail(Accum, Z) --> "+", num(Y), {W is Accum+Y}, etail(W, Z).
etail(Accum, Z) --> "-", num(Y), {W is Accum-Y}, etail(W, Z).
num(Z) --> num(Z, 0).
num(Z, Accum) --> digit(D1), {Z is 10*Accum+D1}.
num(Z, Accum) --> digit(D1), {A2 is 10*Accum+D1}, num(Z, A2).
digit(Z) --> [D],   {"0" =< D, D =< "9", Z is D - "0"}.

expr_value(L, V) :- expr(V, L, []).

In general, left recursion is eliminated by identifying the terminating case of the recursion (num(X), in this case) and "factoring out" this terminating case as illustrated.

Semantics: Constructing Symbolic Representations

The examples above have shown how parsing operations to recognize instances of the numeric expression grammar can be combined with semantic operations to evaluate the numeric expressions.

In fact, it is possible to build many kinds of symbolic computing applications by attaching appropriate semantic actions to DCG rules. In general, these applications are ones that can be characterized using single-pass, top-down processing.

A more general approach, however, is to use the parsing operations to construct symbolic data structures representing the parsed phrase. Often these data objects are called abstract syntax trees (ASTs).

For example, consider the following grammar for numeric expressions.

<expr>::= <constant> | <sum> | <difference>
<sum>::=+( <expr> , <expr> )
<difference>::=-( <expr> , <expr> )

Modifying our parser to build numeric expression objects represented in this way is straightforward. Note that numeric processing doesn't require any changes for the num DCG rules, because those rules already construct the correct numeric object. We just need to transform the expr and etail rules to build the data structures instead of evaluating the expression.

expr(Z) --> num(X), etail(X, Z).
etail(Accum, Accum) --> {true}.
etail(Accum, Z) --> "+", num(Y), etail(plus(Accum, Y), Z).
etail(Accum, Z) --> "-", num(Y), etail(minus(Accum, Y), Z).

expr_structure(L, V) :- expr(V, L, []).

| ?- expr_structure("324-24+1", V).

V = plus(minus(324,24),1) ?