From Compiler writing
Strictly speaking parsing is the process of deciding yes/no as to whether a given sequence of tokens is a valid sentence of some grammar.
Parsing C++ is extremely difficult.
The parser phase takes as input the sequence of tokens output by the Lexical analysis phase and outputs an abstract syntax tree (and/or an error messages if the input does not conform to the language syntax).
Minimal to do list
- Locate the specification of the language syntax for your chosen language.
- Create one or more files that when combined contain at least one instance of source code corresponding to every production in the syntax (i.e., test programs).
- Select a parser generator. Bison is most recommended (Btyacc for those with troublesome syntaxes and the adventurous) and doing it manually is least recommended
- Create a file containing this syntax in a form suitable for the parser generator being used (e.g., bison).
- If necessary remove those parts of the syntax that create parsing difficulties (this may be an iterative process with new problem areas being discovered as work progresses).
- Get the interface to your lexer working
- Create a very cut down version of the syntax (e.g., handle a single expression) and add some actions that print useful messages
- Create a parser, link it with your lexer, parse some code and check the output
- lexer parser interface
- Resolve any shift/reduce or reduce/reduce conflicts in your syntax
- Add the appropriate actions to each syntax production to build a tree node for that part of the syntax.
- Write a program that takes tokens output by lexical analysis, builds an abstract syntax tree and prints out the AST in human readable form.
- parser program details
- You have successfully completed this phase when:
- Feeding the file containing every syntax production to the parser program generates the correct output.
- Feeding the executable file of your parser as input to itself does not crash or go into an infinite loop; it should generate syntax errors.
Useful to do list
- Provide a debug option that supports the printing of the last X tokens read.
Extra to do list
- Provide some form of syntax error recovery.
A practical parser generator will take a grammar as input and output a set of tables and some code that can be used to create a parser. This generated parser will not only make a yes/no decision but can be configured to generate an abstract syntax tree. It is this abstract syntax tree that is feed into the next phase of a compiler.
- ANTLR. Popular parser generator in parts of academia.
- Bison. The parser generator of choice for many development groups.
- btyacc a backtracking version of yacc (LR(k) + can execute some action when performing a trial parse).
- Elkhound a C++ related glr parser generator.
- Enju English language parser.
- Kelbt a backtracking LR(1) parser generator.
- SDF The Syntax Definition Formalism.
- Spirit - An object-oriented recursive descent parser for C++, implemented using template meta-programming.
Many compiler books discuss the theory behind parser generators (the Dragon book over 100 pages) and the various properties of grammars and algorithms that can parse them, they rarely discuss the practical side of using a parser generator.
- Readable introduction to grammars
- Bison manual
- flex & bison by John Levine published by O'Reilly ISBN: 978-0-596-15597-1. An earlier book from O'Reilly was: lex & yacc by Doug Brown, John Levine, Tony Mason, 2 edition (Oct 1992), ISBN-10: 1565920007.
- Parsing Techniques - A Practical Guide first edition available in pdf as a free download. A good place to start for an in depth introduction to parsing. The second edition covers a lot more material.
- Comparison of Syntactic Error Handling in LR Parsers
- Monadic Parser Combinators, an influential paper on parsing using functional concepts.
More advanced reading for those interested in the issues associated with parsing languages that support classes: