CSc 446 Compiler
Construction
Creating a recursive
descent parser
The basic idea is simple, one creates a procedure that mimics the behavior of a Push Down Automata for each grammar variable of the language. This means that procedure A will recognize grammar variable A. The procedure will either match terminals or call procedures to recognize other grammar variables.
As a simple example we will use the following simple grammar:
Expr → Expr addop Term | Term
Addop → + | -
Term → Term mulop Factor | Factor
Mulop → *
Factor → ( Expr ) | numt
Now using the rule for Factor we can write our procedure as:
procedure Factor;
begin
case Token of
lparen : match(lparen);
Expr;
match(rparen);
numt : match(numt);
else error;
end case;
end factor;
This code assumes the existence of the variable Token that holds the current token returned by the lexical analyzer. We will include one case for each of the possible production rules and a final error case since the rule is not nullable. We also use the procedure match to advance to the next input when the correct token is found:
procedure match(desiredToken);
begin
if token = desiredToken then
GetNextToken;
else
error;
end if;
end match;
The call to procedure error will print and appropriate error message and halt the parser.
Procedures for addop and mulop can easily be written and follow the same pattern as the procedure just written for Factor. Where we run into problem is when we try to write the parsing procedure for grammar variable Expr. Our first attempt will be:
procedure Expr;
begin
if Token = ?
and at this point we realize that we have a problem. We cannot decide which grammar rule to use. Both right hand sides are grammar variables and this makes the decision hard. We want to make sure that we never have to back up in the parsing process. Through clever rewriting of the grammar rule we can create a suitable grammar rule. We will change the rule to:
Expr → Term ExprTail
ExprTail → addop Term ExprTail | ε
This rule can also be written as the single EBNF rule:
Expr → Term { addop Term }
We will use the previous version in this class. Notice now that the procedure for Expr can be written as:
procedure Expr;
begin
Term;
ExprTail;
end;
Since we have only one right hand side we have no choices to make and the body of the procedure becomes trivial to write. The grammar rule for ExprTail now introduces a nullable production. This will be handled in the parsing procedure as follows:
procedure ExprTail;
begin
if Token = addop then
match(addop);
Term;
ExprTail;
end if;
end ExprTail;
Here since the rule is nullable we can stop without an error if the desired token is not found.
Finally the rule for Term can be rewritten as:
Term → Factor TermTail
TermTail → mulop Factor TermTail | ε
This can now be written in the same format as Expr and ExprTail.
procedure Term;
begin
Factor;
TermTail;
end Term;
procedure TermTail;
begin
if Token = mulop then
match(mulop);
Factor;
TermTail;
end if;
end TermTail;
We now have code that will recognize correct arithmetic expressions that consist of numbers and the operators +, -, and *.