CMPT 383

Summer 2009

Programming Assignment 3

Due: Monday, July 6, at 5:00 PM. Submit your solution using the CSIL Submit Server .

The syntax for regular expressions is given on page 10 of the course lecture notes on Syntax. Devise a way to represent regular expresions for defining languages over the ASCII character set, using the ASCII chatacter set.

That is, you must use the ASCII character set to represent regular expressions. The regular expressions, in turn, will define languages over the ASCII character set. This means that you must decide on a suitable way to represent the empty language and the empty string. In addition, you must decide on a means to distinguish metacharacters, such as parentheses, from corresponding characters in a language being described. Consider any problems that may arise because of white space, line endings, etc.

Give a clear, concise and complete description of the notation you use for defining regular expression. You should use a context-free grammar to describe the syntax of regular expressions. Your grammar should clearly indicate the relative binding power of the various operators for combining regular langauges and allow parentheses to be omitted when possible. For example, ab|cd* should be interpreted as equivalent to ((ab)|(c(d*))).

Do not extend the regular expression syntax, by adding operators or predefined character sets, beyond that given on page 10 of the lecture notes.

In addition to giving the context-free grammar for regular expressions, you will need to clearly explain your notation. You may want to look at the formal description of the syntax for some existing programming languages to see how this may be done. You may also consider notation for so called "regular expressions" in various programming contexts. Be sure to include a complete reference for any books or reports that you use.

Be careful. You are working with three different levels of notation.

At the top level, you have the notation for describing the language of regular expressions. For example, you may use a vertical bar, "|", in your context-free grammar.
At the next level you have the notation of regular expressions, for describing regular langauges. For example, you may allow a vertical bar, "|", to be used in a regular expression.
At the bottom level you have the regular languages themselves. For example, a language defined by a regular expression may contain some ASCII character strings that contain the vertical bar, "|", as a character in the string.

Your solution to this assignment will form the basis for one or more programming assignments later in the course.