Papers for CMPT-755

Most papers can be downloaded from journal websites that are linked from the Library website. Others can be found on the web. Otherwise you can take a trek down to the Library. If you can't track down a paper, then contact me as I might have a hardcopy. You can also propose a topic in compiler design or related topics based on a group of papers along the lines of the topics below.

You are expected to write a survey paper that explains the ideas from the papers and also how they are connected to topics studied in class. A exhaustive guide to writing the survey has been provided.

Finite-state machine determinization

Gertjan van Noord, The Treatment of Epsilon Moves in Subset Construction, Computational Linguistics, 26(1), pp. 61-76, April 2000.
Ted Leslie, Efficient Approaches to Subset Construction, masters thesis, Computer Science, University of Waterloo, 1995.
J. Howard Johnson and Derick Wood, Instruction Computation in Subset Construction, in Automata Implementation, Darrell Raymond and Derick Wood and Sheng Yu (eds.), Lecture Notes in Computer Science 1260, pp. 64-71, Springer Verlag, 1997.

Finite-state machine minimization

The fastest (O(|Q|log|Q|), where |Q| is the number of states) minimization algorithm:

John E. Hopcroft, An n log n algorithm for minimizing the states in a finite automaton, in The Theory of Machines and Computations, Z. Kohavi (ed.), pp. 189-196, Academic Press, 1971.

A. V. Aho, J. E. Hopcroft and J. D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley Publishing Company, 1974.
David Gries, Describing an Algorithm by Hopcroft, Acta Informatica, vol. 2, pp. 97-109, 1973.

George Anton Kiraz and Edmund Grimley-Evans, Multi-tape Automata for Speech and Language Systems: A Prolog Implementation, Automata Implementation. Second Internation Workshop on Implementing Automata, WIA '97, Derick Wood and Sheng Yu (eds.), Springer Lecture Notes in Computer Science 1436, pp. 87-103, 1997.

Brzozowski's algorithm minimizes an automaton by performing reversal and determinization twice. The subset construction (determinization) is O(2^|Q|), where Q is a set of states:

J. A. Brzozowski, Canonical regular expressions and minimal state graphs for definite events, in Mathematical Theory of Automata, Volume 12 of MRI Symposia Series, pp. 529-561, Polytechnic Press, Polytechnic Institute of Brooklyn, N.Y., 1962.

Ted Leslie, Efficient Approaches to Subset Construction, masters thesis, Computer Science, University of Waterloo, 1995.

The Hopcroft-Ullman algorithm. Complexity is O(|Q|²), and it also requires O(|Q|²) memory:

John E. Hopcroft and Jeffrey D. Ullman, Introduction to Automata Theory, Languages, and Computation, Adison-Wesley Publishing Company, Reading, Massachusets, USA, 1979.

The Aho-Sethi-Ullman algorithm has the same complexity as the Hopcroft-Ullman algorithm, i.e. O(|Q|²), but it needs only O(|Q|) memory:

Alfred V. Aho and Ravi Sethi and Jeffrey D. Ullman, Compilers. Principles, Techniques and Tools, Addison Wesley, 1986.

Bruce W. Watson, A Taxonomy of finite automata minimization algorithms, Eindhoven University of Technology, The Netherlands, Computing Science Note, 93/44, 1993

Derivatives of Regular Expressions

Janusz A. Brzozowski. Derivatives of Regular Expressions. JACM 11(4), Oct 1964, pp. 481-494
G. Berry and R. Sethi. From regular expressions to deterministic automata. Theoretical Computer Science, 48:117-126, 1986. (INRIA Technical Report, RR-0649, Mar 1987) [ps, pdf]

Comparison between Thompson and Glushkov-McNaughton-Yamada automata construction

S. C. Kleene, Representations of Events in Nerve Sets and Finite Automata, In: C. E. Shannon and J. McCarthy, editors, Automata Studies, pages 3-42, Princeton, NJ, 1956. Princeton University Press.
K. Thompson. Regular expression search algorithm. Communications of the ACM, 11(6):410-422, 1968.
V. Glushkov. The abstract theory of automata. Russian Math. Surveys, 16:1-53, 1961.
R. McNaughton and H. Yamada. Regular expressions and state graphs for automata. IEEE Transactions on Electronic Computers, 9:39-47, 1960.
A. Brüggemann-Klein, Regular expressions into finite automata, Theoret. Comput. Sci. 120:197-213, 1993.

Antimirov automata

V. Antimirov, Partial derivatives of regular expressions and finite automaton constructions, Theoret. Comput. Sci. 155:291-319, 1996.
J.-M. Champarnaud and D. Ziadi, New finite automaton constructions based on canonical derivatives, Proc. of CIAA 2000, LNCS 2088, Springer 2001, 94-104.
J.-M. Champarnaud and D. Ziadi, Computing the equation automaton of a regular expression in O(s²) space and time, Proc. of 12th Comb. Pattern Matching Conference: CPM 2001, LNCS 2089, Springer 2001, 157-168.

Efficient ε-free NFA construction

C. Hagenah and A. Muscholl, Computing ε-free NFA from regular expresssions in O(n log²(n)) time, Theor. Inform. Appl. 34(4):257-277, 2000.
J. Hromkovic and S. Seibert and T. Wilke, Translating regular expressions into small ε-free nondeterministic finite automata, J. Comput. Sys. Sci. 62(4):565-588, 2001.
L. Ilie and S. Yu, Constructing NFAs by optimal use of positions in regular expressions, In Proc. of 13th Comb. Pattern Matching Conference: CPM 2002.
C. Allauzen and M. Mohri, A Unified Construction of the Glushkov, Follow and Antimirov Automata, NYU Technical Report (TR 2006-880). [pdf]

Incremental DFA Minimization

Bruce W. Watson, An incremental DFA minimization algorithm, Finite State Methods in Natural Language Processing 2001, ESSLLI Workshop, August 20-24, Helsinki, Finland, 2001.
Bruce B. Watson, Jan Daciuk, An efficient incremental DFA minimization algorithm, Natural Language Engineering, Volume 9, Issue 01, March 2003.

Compression of Automata

Minimization of binary finite automata. C. L. Lucchesi, T. Kowaltowski, and J. Stolfi. Journal of the Brazilian Computer Society, vol. 1 no. 3, 5--11. April 1995. [PS]
Tomasz Kowaltowski, Cláudio L. Lucchesi, and Jorge Stolfi, Minimization of Binary Automata, First South American String Processing Workshop, Belo Horizonte, Brasil, 1993.
Franklin Mark Liang, Word Hy-phen-a-tion by Computer, Ph.D. thesis, Stanford, 1983.
Robert Endre Tarjan and Andrew Chi-Chih Yao, Storing a Sparse Table, Communications of the ACM, 22(11), pp. 606-611, November, 1979.
André Kempe, Reduction of Intermediate Alphabets in Finite-State Transducer Cascades, in: Proc. TALN 2000, Lausanne, Switzerland, pp. 207-215, 2000.

Transforming LR(k) Grammars to LR(1)

Mickunas M.D. One the complete covering problem for LR(k) grammars. JACM 23: 17-30. 1976.
Mickunas, Lancaster and Schneider. Transforming LR(k) Grammars to LR(1), SLR(1) and (1,1) Bounded Right-Context Grammars. JACM 23(3), 1976.
Nijholt, A. One the covering of parsable grammars. J. Compu. Sys. Sci. 15: 99-110.
Gray, J.N, Harrison M.A. On the covering and reduction problems for context-free grammars. JACM 19: 675-698. 1972

Approximations of Context-Free Languages

Fernando C. N. Pereira and Rebecca N. Wright, Finite-State Approximation of Phrase-Structure Grammars, in Finite-State Language Processing, Emmanuel Roche and Yves Schabes (eds.), pp. 149-173, MIT Press, Cambridge, 1997.
Mark-Jan Nederhof, Regular approximation of CFLs: a grammatical view, In H. Bunt and A. Nijholt, editors, Advances in Probabilistic and other Parsing Technologies, chapter 12, pages 221-241. Kluwer Academic Publishers, 2000.
Mark-Jan Nederhof, Context-free Parsing through regular approximation, Finite-state Methods in Natural Language Processing, pp. 13-24, Bilkent University, Ankara, Turkey, 1998.
Mark-Jan Nederhof, Practical Experiments with Regular Approximation of Context-Free Languages, Computational Linguistics, 26(1), April, 2000.
Mehryar Mohri, Mark-Jan Nederhof, Regular Approximation of Context-Free Grammars through Transformation. Robustness in Language and Speech Technology, J.-C. Junqua and G. van Noord (eds.), Kluwer Academic Publishers, pp. 153-163, 2001.
Edmund Grimley Evans, Approximating Context-Free Grammars with a Finite-State Calculus, 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, pp. 452-459, Madrid, 1997.

Containment Problems for deterministic pda's

R.E. Stearns, A Regularity Test for Pushdown Machines, Information and Control, vol. 11, pp. 323-340, 1967.
J.S. Ullian, Partial Algorithm Problems for Context Free Languages, Information and Control, vol. 11, pp. 80-101, 1967.
L.G. Valiant, Regularity and Related Problems for Deterministic Pushdown Automata, Journal of the ACM, 22(1), pp. 1-10, 1975.
Ibarra, O., and Rosier, L. (1981). On the decidability of equivalence for deterministic pushdown transducers, Information Processing Letters 13, 89-93.

Regular Closure of Deterministic Languages

E. Bertsch and M.-J. Nederhof. Regular closure of deterministic languages. SIAM Journal on Computing, 29(1):81-102, 1999. [acm]
M.-J. Nederhof and E. Bertsch. An innovative finite state concept for recognition and parsing of context free languages. In A. Kornai, editor, Extended finite state models of language, chapter 18, pages 226-243. Cambridge University Press, 1999. [pdf]

Size/Lookahead Tradeoff for LR(k) and LL(k) Grammars

H. Leung, D. Wotschke, On the size of parsers and LR(k)-grammars. Theoretical Computer Science 242 (2000) 59-69.
E. Bertsch and M.-J. Nederhof. Size/lookahead tradeoff for LL(k)-grammars. Information Processing Letters, 80:125-129, 2001. [pdf]
Terence Parr and Russell Quong. ANTLR: A Predicated-LL(k) Parser Generator. Journal of Software Practice and Experience, Vol. 25(7), 789-810 (July 1995).
Terence Parr and Russell Quong. LL and LR Translators Need k. SIGPLAN Notices, Volume 31 #2 Februrary 1996.
Terence Parr. Obtaining Practical Variants of LL(k) and LR(k) for k>1 by splitting the atomic k-Tuple. PhD Dissertation, Purdue University, 1993.

Error Correction for CFGs and Regular Languages

Robert A. Wagner, Order-n correction for regular languages. Communications of the ACM. Volume 17, Issue 5 (May 1974) pp. 265 - 268
Aho, A. V., and Peterson, T. G. A minimum-distance error-correcting parser for context-free languages. SIAM J. Comput 1(4), 305-312. 1972.
Lyon, G. 1974. Syntax-directed least-errors analysis for context-free languages: a practical approach. Commun. ACM 17, 1, 3-14.
Irons, E.T. An error-correcting parse algorithm. CACM 6(11) Nov. 1963, pp. 669-673.
R. Teitelbaum. Context-free error analysis by evaluation of algebraic power series. In 5th STOC, pp. 196-199, 1973.
Robert A. Wagner, Joel I. Seiferas. Correcting Counter-Automaton-Recognizable Languages. SIAM J. on Computing. Vol 7. No 3, pp. 357-375. Aug 1978.

Substring Parsing

M.-J. Nederhof and E. Bertsch. Linear-time suffix parsing for deterministic languages. Journal of the ACM, 43(3):524-554, 1996.
Bates, J. and Lavie, A. 1994. Recognizing substring of LR(k) languages in linear time. ACM Trans. Prog. Lang. Syst. 16, 3 (May), pp. 1051-1077.
Cormack, G.V. 1989. An LR substring parser for non-correcting syntax error recovery. SIGPLAN Notices 24, 7. 161-169.
Lang, B. 1991. Parsing Incomplete Sentences. In Proc. of the 12th Intl. Conf. on Computational Linguistics, vol 1 (Budapest, Hungary, Aug) pp. 365-371.

Non-recursive CFGs

M.-J. Nederhof and G. Satta. The language intersection problem for non-recursive context-free grammars, Information and Computation, 192(2):172-184, 2004. [ps]
M.-J. Nederhof and G. Satta. Parsing non-recursive context-free grammars. In 40th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pages 112-119, Philadelphia, Pennsylvania, USA, July 2002. [pdf]
M.-J. Nederhof and G. Satta. IDL-expressions: A formalism for representing and parsing finite languages in natural language processing. Journal of Artificial Intelligence Research, 21:287-317, 2004. [link]
M.-J. Nederhof and G. Satta. IDL-expressions: a compact representation for finite languages in generation systems. In Proceedings of FG Trento, The 7th Conference on Formal Grammar, pages 125-136, Trento, Italy, 2002. [ps]

General Error Recovery in Parsers

Burke, M. G., and Fisher, G. A. 1987. A practical method for LR and LL syntactic error diagnosis and recovery. ACM Trans. Program. Lang. Syst. 9, 2, 164-197.
Graham, S. L., and Rhodes, S. P. 1975. Practical syntactic error recovery. Commun. ACM 18, 11, 639-649.
Feycock, S., and Lazarus, P. 1976. Syntax-directed correction of syntax errors. Softw. Pract. Exper. 6, 2, 207-219.
Hammond, K., and Rayward-Smith, V. J. 1984. A survey on syntactic error recovery and repair. Comput. Lang. (Elmsford, NY) 9, 1, 51-67.
Mauney, J., and Fischer, C. N. 1988. Determining the extent of lookahead in syntactic error repair. ACM Trans. Program. Lang. Syst. 10, 3, 456-469.
Richter, H. 1985. Noncorrecting syntax error recovery. ACM Trans. Prog. Lang. Syst. 7, 3 (July) 478-489.

Handling Syntax Errors in LR Parsers

Mickunas, M. D., and Modry, J. A. 1978. Automatic error recovery for LR parsers. Commun. ACM 21, 6, 459-465.
Sippu, S., and Soisalon-Soininen, E. 1982. Practical error recovery in LR parsing. Conf. Record Ninth Annual ACM Symposium on Principles of Programming Languages, ACM, New York, NY, 177-184.
Sippu, S., and Soisalon-Soininen, E. 1983. A syntax-error handling technique and its experimental analysis. ACM Trans. Program. Lang. Syst. 5, 4, 656-679.
Sippu, S., and Soisalon-Soininen, E. 1980. A scheme for LR(k) parsing with error recovery, Part 1: LR(k) parsing, Part 2: Error Recovery, Part 3: Error Correction. Intern. J. Computer Math. 8: 27-42 (Part 1), 107-119 (Part 2), 189-206 (Part 3).
McKenzie, B., Yeatman, C., and De Vere, L. Error repair in shift-reduce parsers. ACM Transactions on Programming Languages and Systems, 17(4):672-689, July 1995.
E. Bertsch and M.-J. Nederhof. On failure of the pruning technique in ``Error Repair in Shift-Reduce Parsers''. ACM Transactions on Programming Languages and Systems, 21(1):1-10, January 1999. [pdf]
Graham, S. L., Haley, C. B., and Joy, W. N. 1979. Practical LR error recovery. SIGPLAN Notices 14, 8, 168-175.

Handling Syntax Errors in LL Parsers

Fischer, C.N. and Mauney, J. 1992. A simple, fast, and effective LL(1) error repair algorithm. Acta Inf. 29, 109-120.
Fischer, C. N., Milton, D. R., and Quiring, S. B. 1980. Efficient LL(1) error correction and recovery using only insertions. Acta Inf. 13, 141-154.
Fischer, C. N., Tai, K. C., and Milton, D. R. 1979. Immediate error detection in strong LL(1) parsers. Inf. Process. Lett. 8, 261-266.
Hammond, K., and Rayward-Smith, V. J. 1984. A survey on syntactic error recovery and repair. Comput. Lang. (Elmsford, NY) 9, 1, 51-67.
Mauney, J., and Fischer, C. N. 1988. Determining the extent of lookahead in syntactic error repair. ACM Trans. Program. Lang. Syst. 10, 3, 456-469.

Deductive Parsing and Semirings

Principles and implementation of deductive parsing. Stuart M. Shieber, Yves Schabes, and Fernando C. N. Pereira. Journal of Logic Programming, 24(1-2):3-36, July-August 1995. Also available as cmp-lg/9404008
On The Complexity Analysis of Static Analyses. David McAllester. JACM 49(4): 512-537, 2002.
Semiring Parsing. Joshua Goodman. Computational Linguistics, Vol. 25, No. 4, pp. 573--605, December 1999.

Bounded-context parsing

R. W. Floyd. Bounded context syntactic analysis. CACM(7) 2, pp. 62-67. 1964.
J. H. Williams. Bounded context parsable grammars. Inf. Control(28) 4, pp. 314-334. 1975.
David A. Workman. SR(s,k) parsers: A class of shift-reduce bounded-context parsers. Journal of Computer and System Sciences. Vol 22, Issue 2, pp. 178-197.

Space Efficient LR Parsing

F. DeRemer, Simple LR(k) grammars. CACM 14, 7 (1971), 453-460.
F. DeRemer and T. Pennello, Efficient Computation of LALR(1) Look-Ahead Sets. ACM TOPLAS Vol 4, No 4, (1982), 615-649
J. C.-H. Park, K. M. Choe and C.-H. Chang, A New Analysis of LALR Formalisms. ACM TOPLAS Vol 7, No 1 (Jan. 1985), 159-175

Unbounded Lookahead in LR Parsing

LR-Regular (LRR): K. Culik and R. Cohen, LR-regular grammars -- an extension of LR(k) grammars. J. Comp. Sys. Sci. 7, (1973), 66-96.
Extended LR (XLR): T. P. Baker, Extending Lookahead for LR Parsers. J. Comp. Sys. Sci. 22 (1981), 243-259.
LAR(m): M. E. Bermudez and K. M. Schimpf, A Practical Arbitrary Look-ahead LR Parsing Technique. ACM SIGPLAN Notices (1986), 136-144.

Non-canonical extensions to LR Parsing

LR(k,t) grammars: (included in) D. Knuth, On the translation of languages from left to right. Inf. Control 8 (1965), 607-639.
T. G. Syzmanski and J. H. Williams, Noncanonical extensions of bottom-up parsing techniques. SIAM J. Comp. 5, 2 (June 1976), 321-350.
K.-C. Tai, Noncanonical SLR(1) Grammars. ACM TOPLAS, Vol 1 No 2 (Oct 1979), 295-320.
Survey paper: M. D. Hutton, Noncanonical Extensions of LR Parsing Methods. unpublished manuscript (available online).

Generalized LR Parsing

B. Lang. 1974. Deterministic techniques for efficient non-deterministic parsers. In Jacques Loeckx, editor, Automata, Languages and Programming, 2nd Colloquium, University of Saarbrücken. Lecture Notes in Computer Science, Springer Verlag.
M. Tomita. 1987. An efficient augmented context-free parsing algorithm. Computational Linguistics, 13:31-46.
M. Tomita. 1985. Efficient Parsing for Natural Language, A Fast Algorithm for Practical Systems. Kluwer Academic Publishers.
Y. Schabes. 1991. Polynomial Time and Space Shift-Reduce Parsing of Arbitrary Context-free Grammars. Proc. of ACL-1991, 29th Annual Meeting of the Association of Computational Linguistics.
D. Grune and C.J.H. Jacob. Parsing Techniques: a practical guide. Ellis Horwood, Chichester, 1990.

Earley Parsing for Programming Language Grammars

J. Earley, An Efficient Context-free Parsing Algorithm. CACM (13) 2, Feb 1970, 94-102.
P. MacLean and R. N. Horspool, A Faster Earley Parser. Proceedings of Intl. Conference on Compiler Construction (CC'96), Linkoping, Sweden, April 1996, LNCS vol. 1060, Springer-Verlag, pp. 281-293.
J. Aycock and R. N. Horspool. Practical Earley Parsing. Computer Journal, vol. 45, 6 (2002), pp. 620-630.
J. Aycock and R. N. Horspool. Directly-Executable Earley Parsing. Proceedings of CC 2001, Compiler Construction Conference, Genoa, Italy, April 2001, LNCS 2027, Springer-Verlag, pp. 229-243.

GHR optimizations for context-free parsing

S .L. Graham, M. A. Harrison, and W. L. Ruzzo. 1980. An improved context-free recognizer. ACM Transactions on Programming Languages and Systems, 2(3):415-462, July.
H. Leiss. 1990. On Kilbury's modification of Earley's algorithm. ACM Transactions on Programming Languages and Systems, 12(4):610-640, October.

Code Generation using bottom-up tree parsing

C. W. Fraser, D. R. Hanson, and T. A. Proebsting. Engineering a simple, efficient code-generator generator. ACM Letters on Programming Languages and Systems, 1(3):213--226, September 1992.
C. W. Fraser, R. R. Henry, and T. A. Proebsting. BURG---fast optimal instruction selection and tree parsing. ACM SIGPLAN Notices, 27(4):68--76, April 1992.
T. A. Proebsting. BURS automata generation. ACM Transactions on Programming Languages and Systems, 17(3):461--486, May 1995.

Syntax-directed Translation

P. M. Lewis and R. E. Stearns, Syntax-Directed Transduction. JACM vol 15, no 3, July 1968.
A. V. Aho and J. D. Ullman, Properties of Syntax Directed Translations. Journal of Comp. and Sys. Sci. 3, 319-334, 1969.
A. V. Aho and J. D. Ullman, Syntax Directed Translations and the Pushdown Assembler. Journal of Comp. and Sys. Sci. 3, 37-56, 1969.
A. V. Aho and J. D. Ullman, Translations on a Context Free Grammar. Inf. and Control 19, 439-475, 1971.
S. Vere, Translation Equations. CACM, vol 13 no 2, Feb 1970

Tree Transducers for Code Generation

Comon et al. Chapter 6, Tree Automata Techniques and Applications. unpublished book.
J.W. Thatcher, Tree automata: an informal survey, Currents in the Theory of Computing (Alfred V.Aho, ed.), chapter 4, pp. 143--172, Prentice-Hall, 1973.
Gesecg and Steinby, Tree automata, book