CMPT 383

Summer 2009

Programming Assignment 4

Due: Friday, July 31, at 5:00 PM. Submit your solution using the CSIL Submit Server .

In the previous assignment, you wrote a specification for the ASCII representation of regular expressions defining regular languages over the ASCII character set. In Haskell, a regular expression may be represented abstractly by a value of type RE, where RE is defined as follows.

data RE = EmptySet | EmptyString | Char Char | Concat RE RE |
             Choice RE RE | Star RE
             deriving (Show)

In this assignment you will use your solution to Programming Assignment 3 (which wasn't a programming assignment). You should include your solution to this previous assignment with the documentation for this assignment. You may make correction or other changes to your solution to Programming Assignment 3, for example if, while doing this assignment, you find a better solution to the last assignment. In any case, the version of Programming Assignment 3 that you submit with this assignment should be in agreement with the behaviour of your implementation of the readRegularExpression function.

You will also use a slightly modified version of the library of parsing primitives described in Chapter 8 of Hutton. The modifications include changes by Hutton to support monads. In addition, I have made changes to allow backtracking in the parser, which you will need for your recognize function to work correctly. I have also removed all mention of MonadPlus.

For this assignment you should implement three functions.

readRegularExpression :: String -> Maybe RE
generateParser :: RE -> Parser ()
recognize :: RE -> String -> Bool

The function readRegularExpression should take a regular expression expressed as a string as specified in your solution to Programming Assignment 3 and return Just re where re is the RE corresponding to the input string. If the input string is not a valid regular expression, readRegularExpression should return the value Nothing.

The function generateParser should produce a parser for an RE. The parser will be of type Parser (), so no information about the structure of the parsed input is returned. However, we can tell from the output of this parser whether the string is in the regular language, among other things. See the end of my example test results for examples of the result of parsing input that can be generated by the regular expression in more than one way, and the result of parsing input that is not in the regular language. The parser should be generated from the RE in as straightforward a manner as possible.

The function recognize should be a short function that calls generateParser. The value of recognize re string should be a simple Bool indicating whether string is in the regular language represented by re.

An example of some test results may help clarify the assignment specification.

Your code should be as clear and straightforward as possible and should reflect a functional style of programming. If the code for your solution (excluding comments, blank lines and code for testing and demonstrating the correctness of your solution) is over 100 line long, then you probably need to re-examine your programming style and organization.

Note: The two functions, readRegularExpression and recognize could be combined making the intermediate RE unnecessary.

Note: For any regular expression, it is possible to produce a linear time algorithm to determine if a string is in the language defined by the regular expression. The straightforward generation of a parser from a regular expression, as should be done in this assignment, will, in some cases, generate an exponential time algorithm.

Note: In a practical program, it would be highly desirable for the function readRegularExpression to produced useful error messages when trying to parse an invalid regular expression.

Note: Haskell is defined to use Unicode characters. You may assume that only ASCII characters are used in strings passed to the functions you write. In other words, if you don't worry about Unicode and character sets when you write your solution to this assignment, we will ignore all issues related to character sets when we grade this question.

Carefully read the subsection of the course web page on Format of Submitted Work.
Notice that you are responsible for choosing appropriate test data.
Carefully read the subsection of the course web page on Getting Help from Others.
Look at the final sentence of the course web page.
Do your assignment.

In this and other programming assignments, at least half your grade will depend on the clarity, simplicity and organization of your code, including the documentation (usually in the form of comments).