More About EBNF =============== - Lets use Go-style EBNF to formally specify a few small examples. For instance, here is a definition of a bit string:: bit = "0" | "1" // a bit is a 0 or a 1 bit-string = bit { bit } // a bit string is a bit followed by 0 or more bits - SFU uses a `code system for semesters `_ that, while clever, is not is not very user friendly. For example, the code 1047 means the Fall 2004 semester. Here is an EBNF definition of the SFU semester codes:: digit = "0" | ... | "9" century = digit // 0=1900, 1=2000, 2=2100, ... decade = digit // 0=00, 1=10, 2=20, ... year = digit semester = "1" | "4" | "7" semester-code = century decade year semester From this it is not hard to write corresponding recognizer functions:: package main import "fmt" func is_digit(s string) bool { return s == "0" || s == "1" || s == "2" || s == "3" || s == "4" || s == "5" || s == "6" || s == "7" || s == "8" || s == "9" } func is_century(s string) bool { return is_digit(s) } func is_decade(s string) bool { return is_digit(s) } func is_year(s string) bool { return is_digit(s) } func is_semester(s string) bool { return s == "1" || s == "4" || s == "7" } func is_semester_code(s string) bool { return len(s) == 4 && is_century(s[0:1]) && // s[a:a+1] is slice notation that is_decade(s[1:2]) && // returns character s[a] as a string is_year(s[2:3]) && is_semester(s[3:4]) } func main() { fmt.Println(is_semester_code("1047")) fmt.Println(is_semester_code("1141")) fmt.Println(is_semester_code("1142")) } In practice, string matching of this sort is often done using **regular expressions**. For example, the (Python) regular expression ``r"\d\d\d[147]"`` matches just valid semester codes. - Here is the simple grammar of example 3.2 (p. 121) written in our EBNF notation:: assign = id "=" expr id = "A" | "B" | "C" expr = id "+" expr | id "*" expr | "(" expr ")" | id For instance, consider this expression:: A = B * (A + C) We can show how it is derived from the grammar step by step:: assign = id "=" expr = "A" "=" expr = "A" "=" id "*" expr = "A" "=" "B" "*" expr = "A" "=" "B" "*" "(" expr ")" = "A" "=" "B" "*" "(" id "+" expr ")" = "A" "=" "B" "*" "(" "A" "+" expr ")" = "A" "=" "B" "*" "(" "A" "+" id ")" = "A" "=" "B" "*" "(" "A" "+" "C" ")" - Grammars naturally describe parse trees. The rules of a grammar form a hierarchy that we can view as a tree. Parse trees are typically generated by compilers when the evaluate a program, and they contain a lot of useful information. See figure 3.1 (p. 122) of Sebesta for an example parse tree of ``A = B * (A + C)``. - Some grammars are *ambiguous*, i.e. they generate more than one parse tree for a given expression. Here's the ambiguous grammar of example 3.3 (p. 122) in EBNF:: assign = id "=" expr id = "A" | "B" | "C" expr = expr "+" expr // the above grammar rule was id "+" expr | expr "*" expr // the above grammar rule was id "*" expr | "(" expr ")" | id Figure 3.2 (p. 123) shows two different parse trees for the expression ``A = B + C * A`` corresponding to ``A = B + (C * A)`` and ``A = (B + C) * A``. Clearly, this bad: the first parse tree is the one we want because the rules of arithmetic say that multiplication is done before addition. - Unambiguous grammars can be written for arithmetic expressions always apply operators in the correct order. It requires a bit more work. Here is example 3.4 (p. 125) in our EBNF notation:: assign = id "=" expr id = "A" | "B" | "C" expr = expr "+" term | term term = term "*" factor | factor factor = "(" expr ")" | id This is clearly not as simple as the previous grammars, but it is unambiguous and evaluates operators in the correct order. - Another problem that grammars can suffer from is *left recursion*. This occurs in a grammar rule when the first thing on the right side of the rule is the same as the name of the rule, e.g.:: expr = expr "+" term // left recursive Some grammar processing algorithms don't work with left-recursive rules, and so they would need to be re-written in a way the avoids left recursion. - Our examples of parse trees use arithmetic expressions for simplicity. But keep in mind that an entire program can be thought of as one large parse tree.