this blog considered harmful: March 2009

Now that we've looked at FIRST and FOLLOW sets, and the grammar restrictions imposed by an LL(1) parser, we're ready to look at SPTL statement syntax. My first draft of the grammar featured ambiguities that a recursive descent parser simply cannot handle.

For example, consider these definitions:

statement = addstatement | assignstatement | breakstatement | callstatement
divstatement | foreachstatement | forstatement | ifstatement
multstatement | popstatement | postdecstatement | postincstatement
predecstatement | preincstatement | printstatement |
pushstatement | readstatement | repeatestatement |
shiftstatement | substatement | whilestatement

addstatement = varname '+=' expr ';'

substatement = varname '-=' expr ';'

multstatement = varname '*=' expr ';'

divstatement = varname '/=' expr ';'

An LL(1) parser should be able to determine, as soon as it sees a token, what kind of element it is parsing. Now suppose the parser determines that this element is a statement beginning with a varname. What kind of statement is it? There are four possibilities, and that's three more than are allowed.

But all is not lost. A BNF (or BNF-like) grammar is merely a possible description of a language. Most languages can be described using a different BNF grammar specification, without changing the structure of the language.

Recall the grammar restrictions from my previous post:

A grammar G is LL(1) if and only if:

∀ A, A ≠ ∈,
∀ pairs A → u and A → v,
FIRST(u) ∩ FIRST(v) = ∅

If ∃ A, A → ∈,
FIRST(A) ∩ FOLLOW(A) = ∅

The FIRST and FOLLOW rules will help us determine where the grammar needs to be rewritten for an LL(1) parser.

In the example above, we have:

FIRST(addstatement) = varname
FIRST(substatement) = varname
FIRST(multstatement) = varname
FIRST(divstatement) = varname

Thus:

FIRST(addstatement) ∩ FIRST(substatement) = varname

The same is true for any pair taken from these four kinds of statement.

The solution is to create a special kind of statement:

assignstatement = varname ( addassign | divassign | multassign | subassign ) ';'

Now the parser sees only one kind of statement that begins with a varname: an assignstatement. When the parser finds a varname at the beginning of a statement, it recognizes it as an assignstatement and looks at the next token to determine what kind of assignstatement it is. Now we have a grammar the LL(1) parser can handle.

I've put both the full SPTL grammar and the first and follow rules (.csv file) online.

Labels: LL(1), SPTL, SPTL grammar