More About EBNFΒΆ
Lets use Go-style EBNF to formally specify a few small examples. For instance, here is a definition of a bit string:
bit = "0" | "1" // a bit is a 0 or a 1 bit-string = bit { bit } // a bit string is a bit followed by 0 or more bits
SFU uses a code system for semesters that, while clever, is not is not very user friendly. For example, the code 1047 means the Fall 2004 semester.
Here is an EBNF definition of the SFU semester codes:
digit = "0" | ... | "9" century = digit // 0=1900, 1=2000, 2=2100, ... decade = digit // 0=00, 1=10, 2=20, ... year = digit semester = "1" | "4" | "7" semester-code = century decade year semester
From this it is not hard to write corresponding recognizer functions:
package main import "fmt" func is_digit(s string) bool { return s == "0" || s == "1" || s == "2" || s == "3" || s == "4" || s == "5" || s == "6" || s == "7" || s == "8" || s == "9" } func is_century(s string) bool { return is_digit(s) } func is_decade(s string) bool { return is_digit(s) } func is_year(s string) bool { return is_digit(s) } func is_semester(s string) bool { return s == "1" || s == "4" || s == "7" } func is_semester_code(s string) bool { return len(s) == 4 && is_century(s[0:1]) && // s[a:a+1] is slice notation that is_decade(s[1:2]) && // returns character s[a] as a string is_year(s[2:3]) && is_semester(s[3:4]) } func main() { fmt.Println(is_semester_code("1047")) fmt.Println(is_semester_code("1141")) fmt.Println(is_semester_code("1142")) }
In practice, string matching of this sort is often done using regular expressions. For example, the (Python) regular expression
r"\d\d\d[147]"
matches just valid semester codes.Here is the simple grammar of example 3.2 (p. 121) written in our EBNF notation:
assign = id "=" expr id = "A" | "B" | "C" expr = id "+" expr | id "*" expr | "(" expr ")" | id
For instance, consider this expression:
A = B * (A + C)
We can show how it is derived from the grammar step by step:
assign = id "=" expr = "A" "=" expr = "A" "=" id "*" expr = "A" "=" "B" "*" expr = "A" "=" "B" "*" "(" expr ")" = "A" "=" "B" "*" "(" id "+" expr ")" = "A" "=" "B" "*" "(" "A" "+" expr ")" = "A" "=" "B" "*" "(" "A" "+" id ")" = "A" "=" "B" "*" "(" "A" "+" "C" ")"
Grammars naturally describe parse trees. The rules of a grammar form a hierarchy that we can view as a tree. Parse trees are typically generated by compilers when the evaluate a program, and they contain a lot of useful information.
See figure 3.1 (p. 122) of Sebesta for an example parse tree of
A = B * (A + C)
.Some grammars are ambiguous, i.e. they generate more than one parse tree for a given expression. Here’s the ambiguous grammar of example 3.3 (p. 122) in EBNF:
assign = id "=" expr id = "A" | "B" | "C" expr = expr "+" expr // the above grammar rule was id "+" expr | expr "*" expr // the above grammar rule was id "*" expr | "(" expr ")" | id
Figure 3.2 (p. 123) shows two different parse trees for the expression
A = B + C * A
corresponding toA = B + (C * A)
andA = (B + C) * A
. Clearly, this bad: the first parse tree is the one we want because the rules of arithmetic say that multiplication is done before addition.Unambiguous grammars can be written for arithmetic expressions always apply operators in the correct order. It requires a bit more work. Here is example 3.4 (p. 125) in our EBNF notation:
assign = id "=" expr id = "A" | "B" | "C" expr = expr "+" term | term term = term "*" factor | factor factor = "(" expr ")" | id
This is clearly not as simple as the previous grammars, but it is unambiguous and evaluates operators in the correct order.
Another problem that grammars can suffer from is left recursion. This occurs in a grammar rule when the first thing on the right side of the rule is the same as the name of the rule, e.g.:
expr = expr "+" term // left recursive
Some grammar processing algorithms don’t work with left-recursive rules, and so they would need to be re-written in a way the avoids left recursion.
Our examples of parse trees use arithmetic expressions for simplicity. But keep in mind that an entire program can be thought of as one large parse tree.