CSc 446 Compiler Construction

 

Creating a recursive descent parser

 

The basic idea is simple, one creates a procedure that mimics the behavior of a Push Down Automata for each grammar variable of the language. This means that procedure A will recognize grammar variable A. The procedure will either match terminals or call procedures to recognize other grammar variables.

 

As a simple example we will use the following simple grammar:

 

            Expr             Expr addop Term | Term

            Addop         + | -

            Term            Term mulop Factor | Factor

            Mulop          *

            Factor          ( Expr ) | numt

 

Now using the rule for Factor we can write our procedure as:

 

            procedure Factor;

            begin

                        case Token of

                        lparen : match(lparen);

                                      Expr;

                                      match(rparen);

                        numt     : match(numt);

                        else error;

                        end case;

            end factor;

 

This code assumes the existence of the variable Token that holds the current token returned by the lexical analyzer. We will include one case for each of the possible production rules and a final error case since the rule is not nullable. We also use the procedure match to advance to the next input when the correct token is found:

 

            procedure match(desiredToken);

            begin

                        if token = desiredToken then

                                    GetNextToken;

                        else

                                    error;

                        end if;

            end match;

 

The call to procedure error will print and appropriate error message and halt the parser.

 

Procedures for addop and mulop can easily be written and follow the same pattern as the procedure just written for Factor. Where we run into problem is when we try to write the parsing procedure for grammar variable Expr. Our first attempt will be:

 

            procedure  Expr;

            begin

                        if Token = ?

 

and at this point we realize that we have a problem. We cannot decide which grammar rule to use. Both right hand sides are grammar variables and this makes the decision hard. We want to make sure that we never have to back up in the parsing process. Through clever rewriting of the grammar rule we can create a suitable grammar rule. We will change the rule to:

 

            Expr                        Term ExprTail

            ExprTail                  addop Term ExprTail | ε

 

This rule can also be written as the single EBNF rule:

 

            Expr                         Term { addop Term }

 

We will use the previous version in this class. Notice now that the procedure for Expr can be written as:

 

            procedure Expr;

            begin

                        Term;

                        ExprTail;

            end;

 

Since we have only one right hand side we have no choices to make and the body of the procedure becomes trivial to write. The grammar rule for ExprTail now introduces a nullable production. This will be handled in the parsing procedure as follows:

 

            procedure ExprTail;

            begin

                        if Token = addop then

                                    match(addop);

                                    Term;

                                    ExprTail;

                        end if;

            end ExprTail;

 

 

Here since the rule is nullable we can stop without an error if the desired token is not found.

 

Finally the rule for Term can be rewritten as:

 

            Term                       Factor TermTail

            TermTail                   mulop Factor TermTail | ε

 

This can now be written in the same format as Expr and ExprTail.

 

            procedure Term;

            begin

                        Factor;

                        TermTail;

            end Term;

 

            procedure TermTail;

            begin

                        if Token = mulop then

                                    match(mulop);

                                    Factor;

                                    TermTail;

                        end if;

            end TermTail;

 

We now have code that will recognize correct arithmetic expressions that consist of numbers and the operators +, -, and *.