Definite Clause Grammars (Continued)

CMPT 383 Lecture Notes
Robert D. Cameron
April 2002

General Syntax of DCG Rules

The following syntax describes the structure of DCG rules (clauses).

<definite-clause>	::=	<LHS> `-->` <RHS> `.`
<LHS>	::=	<non-terminal>
<nonterminal>	::=	<identifier> `(` <arg> { `,` <arg> } `)`
<RHS>	::=	<RHS-item> { `,` <RHS-item> }
<RHS-item>	::=	<nonterminal> \| <terminal-list> \| <logic>
<terminal-list>	::=	`[` <terminal> { `,` <terminal> } `]`
<logic>	::=	`{` <goal> `}`

The LHS of each definite clause is a nonterminal symbol, possibly parameterized. On the RHS, three kinds of item may be found: nonterminal symbols (possibly parameterized), lists of terminal symbols inside square brackets, or arbitrary logic goals within braces.

General Translation of DCG Rules to Parsing Predicates

To understand what parsing predicates are generated for particular DCG rules, we can describe the process of translating DCG rules to parsing predicates as follows. However, note that you do not have to do this translation at any point; Prolog automatically does it on recognition of the --> functor.

For each DCG rule, transformation to an equivalent parsing predicate may be performed as follows.

Replace all lists of terminal symbols within a single set of square brackets with lists of individually bracketted symbols. For example, [and, then] becomes [and], [then].
Working with the result of this last step, number all the RHS items in sequence, skipping the logic clauses in braces.
Let n be the number of RHS items numbered in the last step.
Add the parameters L₀ and L_n to the end of the argument list of the LHS nonterminal of the clause.
For each RHS nonterminal, let i be the number assigned to it in step 2. Add the parameters L_i-1, L_i to the nonterminal.
For each RHS terminal, let i be the number assigned to it in step 2 and let t be the terminal itself. Replace the terminal with the goal 'C'(L_i-1, t, L_i).
Make no changes to those RHS items enclosed in braces (goals).
Replace the --> symbol with :-

In such a transformations, L₀ represents the input list being passed in for parsing, while L_j represents the input that remains unconsumed after the first j of the RHS terminals and nonterminals have been successfully parsed.

For example, consider the following DCG rule.

expr(Z) --> num(X), [+], expr(Y), {Z is X+Y}.

Using the procedures above, we can see that this translates to the following clause for expr/3.

expr(Z, L0, L3) :- 
  num(X, L0, L1), 'C'(L1, +, L2), expr(Y, L2, L3), {Z is X+Y}.

Parsing With Strings

Prolog interprets string constants within double quotes as list of character codes (integers).

"CMPT384" = [67,77,80,84,51,56,52]

For example, we can rework our numeric expression grammar to use strings. That is, instead of parsing a list of symbolic and numeric constants such as [100, +, 20, +, 3, +, 6], we want to parse the string expression "100+20+3+6".

The easy part is to replace the [+] operator symbol with "+".
The trickier part is to parse a string of digits (e.g., "100" as a num, rather than relying on Prolog's built-in numeric constants. We can compute the value of a string using an accumulating value for the number determined so far. For example, in parsing the string "301", if 30 is the value accumulated for the string "30", then 30*10+1 is the new value computed when the next digit ("1") is encountered.

expr(Z) --> num(Z).
expr(Z) --> num(X), "+", expr(Y), {Z is X+Y}.
expr(Z) --> num(X), "-", expr(Y), {Z is X-Y}.
num(Z) --> num(Z, 0).
num(Z, Accum) --> digit(D1), {Z is 10*Accum+D1}.
num(Z, Accum) --> digit(D1), {A2 is 10*Accum+D1}, num(Z, A2).
digit(Z) --> [D],   {"0" =< D, D =< "9", Z is D - "0"}.

expr_value(L, V) :- expr(V, L, []).

| ?- expr_value("301+20+3+5", S).

S = 329 ? 

yes
| ?- expr_value("324-24+1", V).

V = 299 ?

Prolog parses the strings and computes the values of the expressions as specified. But the last example illustrates once again the unusual right-associative interpretation of substraction that results from our grammar. How can this be changed to give the standard left-associative rule for subtraction, so that "324-24+1" evaluates to 301?

Left Recursion

Left-associativity of expressions is obtained by grammar rules that are left-recursive.

For example, if we want the interpretation of "324-24+1" to be (324-24)+1=301, we might try changing two of our DCG rules as follows.

expr(Z) --> num(Z).
expr(Z) --> expr(X), "+", num(Y), {Z is X+Y}.
expr(Z) --> expr(X), "-", num(Y), {Z is X-Y}.

This expresses the correct grammatical structure.
But if we try to parse a subtraction expression such as "4-3", we'll never get beyond the second clause (infinite recursion)!

Left-recursive grammars can be transformed to an acceptable DCG form for parsing, as shown by the following example.

expr(Z) --> num(X), etail(X, Z).
etail(Accum, Z) --> {Z is Accum}.
etail(Accum, Z) --> "+", num(Y), {W is Accum+Y}, etail(W, Z).
etail(Accum, Z) --> "-", num(Y), {W is Accum-Y}, etail(W, Z).
num(Z) --> num(Z, 0).
num(Z, Accum) --> digit(D1), {Z is 10*Accum+D1}.
num(Z, Accum) --> digit(D1), {A2 is 10*Accum+D1}, num(Z, A2).
digit(Z) --> [D],   {"0" =< D, D =< "9", Z is D - "0"}.

expr_value(L, V) :- expr(V, L, []).

In general, left recursion is eliminated by identifying the terminating case of the recursion (num(X), in this case) and "factoring out" this terminating case as illustrated.

Semantics: Constructing Symbolic Representations

The examples above have shown how parsing operations to recognize instances of the numeric expression grammar can be combined with semantic operations to evaluate the numeric expressions.

In fact, it is possible to build many kinds of symbolic computing applications by attaching appropriate semantic actions to DCG rules. In general, these applications are ones that can be characterized using single-pass, top-down processing.

A more general approach, however, is to use the parsing operations to construct symbolic data structures representing the parsed phrase. Often these data objects are called abstract syntax trees (ASTs).

For example, consider the following grammar for numeric expressions.

<expr>	::=	<constant> \| <sum> \| <difference>
<sum>	::=	`+(` <expr> `,` <expr> `)`
<difference>	::=	`-(` <expr> `,` <expr> `)`
<constant>	::=	<numeric-atom>

Modifying our parser to build numeric expression objects represented in this way is straightforward. Note that numeric processing doesn't require any changes for the num DCG rules, because those rules already construct the correct numeric object. We just need to transform the expr and etail rules to build the data structures instead of evaluating the expression.

expr(Z) --> num(X), etail(X, Z).
etail(Accum, Accum) --> {true}.
etail(Accum, Z) --> "+", num(Y), etail(plus(Accum, Y), Z).
etail(Accum, Z) --> "-", num(Y), etail(minus(Accum, Y), Z).

expr_structure(L, V) :- expr(V, L, []).

| ?- expr_structure("324-24+1", V).

V = plus(minus(324,24),1) ? 

yes