The following syntax describes the structure of DCG rules (clauses).
<definite-clause> | ::= | <LHS> -->
<RHS> . |
<LHS> | ::= | <non-terminal> |
<nonterminal> | ::= | <identifier>
( <arg> { , <arg> }
) |
<RHS> | ::= | <RHS-item> { , <RHS-item> } |
<RHS-item> | ::= | <nonterminal> | <terminal-list> | <logic> |
<terminal-list> | ::= |
[ <terminal> { , <terminal> }
] |
<logic> | ::= |
{ <goal>
} |
The LHS of each definite clause is a nonterminal symbol, possibly parameterized. On the RHS, three kinds of item may be found: nonterminal symbols (possibly parameterized), lists of terminal symbols inside square brackets, or arbitrary logic goals within braces.
To understand what parsing predicates are generated for
particular DCG rules, we can describe the process of
translating DCG rules to parsing predicates as follows.
However, note that you do not have to do this translation
at any point; Prolog automatically does it on recognition
of the -->
functor.
For each DCG rule, transformation to an equivalent parsing predicate may be performed as follows.
[and, then]
becomes
[and], [then]
.
L0
and
L
n
to the end of the argument list of the LHS nonterminal of the clause.
Li-1
,
Li
to the nonterminal.
'C'(Li-1, t,
Li)
.
-->
symbol with :-
In such a transformations, L0
represents the input list being passed in for parsing, while
Lj
represents the
input that remains unconsumed after the first j
of the RHS terminals and nonterminals have been successfully
parsed.
For example, consider the following DCG rule.
expr(Z) --> num(X), [+], expr(Y), {Z is X+Y}.Using the procedures above, we can see that this translates to the following clause for
expr/3
.
expr(Z, L0, L3) :- num(X, L0, L1), 'C'(L1, +, L2), expr(Y, L2, L3), {Z is X+Y}.
Prolog interprets string constants within double quotes as list of character codes (integers).
"CMPT384"
= [67,77,80,84,51,56,52]
For example, we can rework our numeric expression grammar to
use strings. That is, instead of parsing a
list of symbolic and numeric constants
such as [100, +, 20, +, 3, +, 6]
,
we want to parse the string expression "100+20+3+6"
.
[+]
operator
symbol with "+"
.
"100"
as a num
, rather than relying on Prolog's built-in
numeric constants. We can compute the value of
a string using an accumulating value for the number
determined so far. For example, in parsing the string
"301"
, if 30 is the value accumulated for
the string "30"
, then 30*10+1 is the new value computed
when the next digit ("1"
) is encountered.
expr(Z) --> num(Z). expr(Z) --> num(X), "+", expr(Y), {Z is X+Y}. expr(Z) --> num(X), "-", expr(Y), {Z is X-Y}. num(Z) --> num(Z, 0). num(Z, Accum) --> digit(D1), {Z is 10*Accum+D1}. num(Z, Accum) --> digit(D1), {A2 is 10*Accum+D1}, num(Z, A2). digit(Z) --> [D], {"0" =< D, D =< "9", Z is D - "0"}. expr_value(L, V) :- expr(V, L, []).
| ?- expr_value("301+20+3+5", S). S = 329 ? yes | ?- expr_value("324-24+1", V). V = 299 ?Prolog parses the strings and computes the values of the expressions as specified. But the last example illustrates once again the unusual right-associative interpretation of substraction that results from our grammar. How can this be changed to give the standard left-associative rule for subtraction, so that
"324-24+1"
evaluates to
301?
Left-associativity of expressions is obtained by grammar rules that are left-recursive.
For example, if we want the interpretation of
"324-24+1"
to be (324-24)+1=301, we might try changing two of
our DCG rules as follows.
expr(Z) --> num(Z). expr(Z) --> expr(X), "+", num(Y), {Z is X+Y}. expr(Z) --> expr(X), "-", num(Y), {Z is X-Y}.
"4-3"
,
we'll never get beyond the second clause (infinite recursion)!
Left-recursive grammars can be transformed to an acceptable DCG form for parsing, as shown by the following example.
expr(Z) --> num(X), etail(X, Z). etail(Accum, Z) --> {Z is Accum}. etail(Accum, Z) --> "+", num(Y), {W is Accum+Y}, etail(W, Z). etail(Accum, Z) --> "-", num(Y), {W is Accum-Y}, etail(W, Z). num(Z) --> num(Z, 0). num(Z, Accum) --> digit(D1), {Z is 10*Accum+D1}. num(Z, Accum) --> digit(D1), {A2 is 10*Accum+D1}, num(Z, A2). digit(Z) --> [D], {"0" =< D, D =< "9", Z is D - "0"}. expr_value(L, V) :- expr(V, L, []).
In general, left recursion is eliminated by identifying
the terminating case of the recursion (num(X)
, in this
case) and "factoring out" this terminating case as illustrated.
The examples above have shown how parsing operations to recognize instances of the numeric expression grammar can be combined with semantic operations to evaluate the numeric expressions.
In fact, it is possible to build many kinds of symbolic computing applications by attaching appropriate semantic actions to DCG rules. In general, these applications are ones that can be characterized using single-pass, top-down processing.
A more general approach, however, is to use the parsing operations to construct symbolic data structures representing the parsed phrase. Often these data objects are called abstract syntax trees (ASTs).
For example, consider the following grammar for numeric expressions.
<expr> | ::= | <constant> | <sum> | <difference> |
<sum> | ::= | +( <expr>
, <expr> ) |
<difference> | ::= | -( <expr>
, <expr> ) |
<constant> | ::= | <numeric-atom> |
Modifying our parser to build numeric expression
objects represented in this way is straightforward.
Note that numeric processing doesn't require any changes
for the num
DCG rules, because those rules already
construct the correct numeric object. We just need to
transform the expr
and etail
rules to build the data structures instead of evaluating the
expression.
expr(Z) --> num(X), etail(X, Z). etail(Accum, Accum) --> {true}. etail(Accum, Z) --> "+", num(Y), etail(plus(Accum, Y), Z). etail(Accum, Z) --> "-", num(Y), etail(minus(Accum, Y), Z). expr_structure(L, V) :- expr(V, L, []). | ?- expr_structure("324-24+1", V). V = plus(minus(324,24),1) ? yes