ThinkGeek - Cool Stuff for Geeks and Technophiles

Thursday, January 29, 2009

LL(1) grammar restrictions

Before I discuss the statement syntax for SPTL, I want to look at grammar restrictions. As I mentioned in a previous post, I'm planning to build a recursive descent (top down) parser for SPTL. Specifically, it will be an LL(1) parser. That is, parsing from Left to right, with a Lookahead of 1 token.

This places restrictions on the grammar. The parser must be able to determine, when it finds a token, what grammar rule to apply. If a language had these rules:


statement : incstatement | decstatement
incstatement : var '++'
decstatement : var '--'


then the parser wouldn't know when it encountered a var whether this was an incstatement or a decstatement.

For a grammar to qualify as LL(1), it must satisfy a set of rules.

Define:

A as a sentence (one or more tokens) in the grammar
u and v as sentences derived from A
w and x as unspecified sentences
FIRST as the set of possible first tokens for the sentence
FOLLOW as the set of possible first tokens for the following sentence


The following symbols mean:

For all
There exists
Empty set
Empty sentence
Intersection (the elements common to both sets)


A grammar G is LL(1) if and only if:


  • ∀ A, A ≠ ∈,
    ∀ pairs A → u and A → v,
    FIRST(u) ∩ FIRST(v) = ∅

  • If ∃ A, A → ∈,
    FIRST(A) ∩ FOLLOW(A) = ∅



A pair is any two elements that can appear in the same part of the sentence (in the BNF-like grammar above, these are denoted by the | symbol).

So, going back to the incstatement/decstatement example above,
statement is A, the sentence we are looking at
incstatement and decstatement are u and v respectively
FIRST(incstatement) is {var}
FIRST(decstatement) is {var}

FIRST(incstatement)FIRST(decstatement) = {var}
{var} ≠ ∅

Therefore, the grammar above is not LL(1).

How can we change it into an LL(1) grammar? Simple:


statement = var (incop | decop)
incop = '++'
decop = '--'


Now the var is part of statement, and in place of incstatement/decstatement we have incop/decop.

FIRST(incop) is {'++'}
FIRST(decop) is {'--'}

FIRST(incop)FIRST(decop) = ∅

Now the grammar (or at least this portion of it) is LL(1).

This is a very simple example. But how do we determine FIRST and FOLLOW for all sentences of a grammar? I'll explain in my next post.

Labels: , ,

0 Comments:

Post a Comment

Links to this post:

Create a Link

<< Home