ThinkGeek - Cool Stuff for Geeks and Technophiles

Wednesday, March 18, 2009

SPTL grammar: statements

Now that we've looked at FIRST and FOLLOW sets, and the grammar restrictions imposed by an LL(1) parser, we're ready to look at SPTL statement syntax. My first draft of the grammar featured ambiguities that a recursive descent parser simply cannot handle.

For example, consider these definitions:


statement = addstatement | assignstatement | breakstatement | callstatement
divstatement | foreachstatement | forstatement | ifstatement
multstatement | popstatement | postdecstatement | postincstatement
predecstatement | preincstatement | printstatement |
pushstatement | readstatement | repeatestatement |
shiftstatement | substatement | whilestatement

addstatement = varname '+=' expr ';'

substatement = varname '-=' expr ';'

multstatement = varname '*=' expr ';'

divstatement = varname '/=' expr ';'


An LL(1) parser should be able to determine, as soon as it sees a token, what kind of element it is parsing. Now suppose the parser determines that this element is a statement beginning with a varname. What kind of statement is it? There are four possibilities, and that's three more than are allowed.

But all is not lost. A BNF (or BNF-like) grammar is merely a possible description of a language. Most languages can be described using a different BNF grammar specification, without changing the structure of the language.

Recall the grammar restrictions from my previous post:


A grammar G is LL(1) if and only if:


  • ∀ A, A ≠ ∈,
    ∀ pairs A → u and A → v,
    FIRST(u) ∩ FIRST(v) = ∅

  • If ∃ A, A → ∈,
    FIRST(A) ∩ FOLLOW(A) = ∅




The FIRST and FOLLOW rules will help us determine where the grammar needs to be rewritten for an LL(1) parser.

In the example above, we have:

FIRST(addstatement) = varname
FIRST(substatement) = varname
FIRST(multstatement) = varname
FIRST(divstatement) = varname


Thus:

FIRST(addstatement) ∩ FIRST(substatement) = varname


The same is true for any pair taken from these four kinds of statement.

The solution is to create a special kind of statement:


assignstatement = varname ( addassign | divassign | multassign | subassign ) ';'


Now the parser sees only one kind of statement that begins with a varname: an assignstatement. When the parser finds a varname at the beginning of a statement, it recognizes it as an assignstatement and looks at the next token to determine what kind of assignstatement it is. Now we have a grammar the LL(1) parser can handle.

I've put both the full SPTL grammar and the first and follow rules (.csv file) online.

Labels: , ,

Thursday, January 29, 2009

LL(1) grammar restrictions

Before I discuss the statement syntax for SPTL, I want to look at grammar restrictions. As I mentioned in a previous post, I'm planning to build a recursive descent (top down) parser for SPTL. Specifically, it will be an LL(1) parser. That is, parsing from Left to right, with a Lookahead of 1 token.

This places restrictions on the grammar. The parser must be able to determine, when it finds a token, what grammar rule to apply. If a language had these rules:


statement : incstatement | decstatement
incstatement : var '++'
decstatement : var '--'


then the parser wouldn't know when it encountered a var whether this was an incstatement or a decstatement.

For a grammar to qualify as LL(1), it must satisfy a set of rules.

Define:

A as a sentence (one or more tokens) in the grammar
u and v as sentences derived from A
w and x as unspecified sentences
FIRST as the set of possible first tokens for the sentence
FOLLOW as the set of possible first tokens for the following sentence


The following symbols mean:

For all
There exists
Empty set
Empty sentence
Intersection (the elements common to both sets)


A grammar G is LL(1) if and only if:


  • ∀ A, A ≠ ∈,
    ∀ pairs A → u and A → v,
    FIRST(u) ∩ FIRST(v) = ∅

  • If ∃ A, A → ∈,
    FIRST(A) ∩ FOLLOW(A) = ∅



A pair is any two elements that can appear in the same part of the sentence (in the BNF-like grammar above, these are denoted by the | symbol).

So, going back to the incstatement/decstatement example above,
statement is A, the sentence we are looking at
incstatement and decstatement are u and v respectively
FIRST(incstatement) is {var}
FIRST(decstatement) is {var}

FIRST(incstatement)FIRST(decstatement) = {var}
{var} ≠ ∅

Therefore, the grammar above is not LL(1).

How can we change it into an LL(1) grammar? Simple:


statement = var (incop | decop)
incop = '++'
decop = '--'


Now the var is part of statement, and in place of incstatement/decstatement we have incop/decop.

FIRST(incop) is {'++'}
FIRST(decop) is {'--'}

FIRST(incop)FIRST(decop) = ∅

Now the grammar (or at least this portion of it) is LL(1).

This is a very simple example. But how do we determine FIRST and FOLLOW for all sentences of a grammar? I'll explain in my next post.

Labels: , ,