ThinkGeek - Cool Stuff for Geeks and Technophiles

Thursday, January 29, 2009

LL(1) grammar restrictions

Before I discuss the statement syntax for SPTL, I want to look at grammar restrictions. As I mentioned in a previous post, I'm planning to build a recursive descent (top down) parser for SPTL. Specifically, it will be an LL(1) parser. That is, parsing from Left to right, with a Lookahead of 1 token.

This places restrictions on the grammar. The parser must be able to determine, when it finds a token, what grammar rule to apply. If a language had these rules:

statement : incstatement | decstatement
incstatement : var '++'
decstatement : var '--'

then the parser wouldn't know when it encountered a var whether this was an incstatement or a decstatement.

For a grammar to qualify as LL(1), it must satisfy a set of rules.


A as a sentence (one or more tokens) in the grammar
u and v as sentences derived from A
w and x as unspecified sentences
FIRST as the set of possible first tokens for the sentence
FOLLOW as the set of possible first tokens for the following sentence

The following symbols mean:

For all
There exists
Empty set
Empty sentence
Intersection (the elements common to both sets)

A grammar G is LL(1) if and only if:

  • ∀ A, A ≠ ∈,
    ∀ pairs A → u and A → v,
    FIRST(u) ∩ FIRST(v) = ∅

  • If ∃ A, A → ∈,
    FIRST(A) ∩ FOLLOW(A) = ∅

A pair is any two elements that can appear in the same part of the sentence (in the BNF-like grammar above, these are denoted by the | symbol).

So, going back to the incstatement/decstatement example above,
statement is A, the sentence we are looking at
incstatement and decstatement are u and v respectively
FIRST(incstatement) is {var}
FIRST(decstatement) is {var}

FIRST(incstatement)FIRST(decstatement) = {var}
{var} ≠ ∅

Therefore, the grammar above is not LL(1).

How can we change it into an LL(1) grammar? Simple:

statement = var (incop | decop)
incop = '++'
decop = '--'

Now the var is part of statement, and in place of incstatement/decstatement we have incop/decop.

FIRST(incop) is {'++'}
FIRST(decop) is {'--'}

FIRST(incop)FIRST(decop) = ∅

Now the grammar (or at least this portion of it) is LL(1).

This is a very simple example. But how do we determine FIRST and FOLLOW for all sentences of a grammar? I'll explain in my next post.

Labels: , ,

Tuesday, January 13, 2009

so long, Dr. Dobb's

After 33 years — many, many generations in the computing world — Dr. Dobb's Journal is ending its print edition. The magazine will be folded into a monthly section in InformationWeek. The online Dr. Dobb's Portal will continue to publish new articles.

I remember subscribing to Dr. Dobb's Journal in the late 1980s/early 1990s while I was in college. Most of the articles were way above my head, but it gave me something to strive for. These days I only pick up an occasional issue at the bookstore, and spend far more time looking at the website. So, practically speaking, this change won't affect me much. But I'm sad to see the demise of the print edition nonetheless.


Thursday, January 8, 2009

SPTL grammar: expressions

Now that we've seen how names work in SPTL, we'll take a look at how to build expressions.

We start with a simple declaration: an expr consists of a boolexpr, followed by zero or more sets consisting of a boolop and another boolexpr. But what's a boolexpr? Simple, it's a relexpr followed optionally by a relop and another relexpr. And so on.

Here's the BNF, building all the way to factor, which has several options including (expr), which takes us back to the beginning.

The purpose of this hierarchy is to set precedence rules for the order of evaluation. Due to the nature of a recursive descent parser, items that appear later in this list will be evaluated first.

expr = boolexpr { boolop boolexpr }

boolexpr = relexpr [ relop relexpr ]

boolop = '&&' | '||'

relexpr = [ '-' ] term { addop term }

relop = '<' | '>' | '==' | '<=' | '>=' | '!='

term = factor { multop factor }

addop = '+' | '-'

factor = constant | varname | funcname | '(' expr ')' | negate

multop = '*' | '/' | '%'

negate = '!' factor

constant = boolconstant | numconstant | stringconstant

For example, the expression x + 3 < y * 2 can be parsed like this:

-> boolexpr
-> relexpr relop relexpr
-> term addop term "<" term
-> factor "+" factor "<" factor mulop factor
-> varname "+" constant "<" varname "*" constant
-> "x" "+" 3 "<" "y" "*" "2"

But to really see how this works, we'll need to simulate a tree structure:

/ \
+ \
--- *
/ \ ---
x 3 / \
y 2

Atrocious ascii drawing, I know. I need to get a graphical graphing tool. If you have any suggestions, let me know in the comments.

Anyway, the y * 2 will be evaluated first, since it is at the lowest level of the tree. Next to be evaluated will be x + 3. In this example, since these are on opposite sides of the < sign, the relative order between them does not matter. In other expressions, it will. Once the values of x + 3 and y * 2 are known, they can be compared with each other. If x + 3 is less than y * 2 the value of the entire expression is true, otherwise the value is false.

What I don't know at this point is how it will work to have boolean, string, and numeric constants in the same expression hierarchy. Somehow, the parser will have to be able to evaluate, for example, true + 3 >= 7, or 10 + "this", and return a sensible answer.

As I build the compiler I'll have to figure out how that will work. Probably the solution is to assign all string and numeric constants a boolean value for boolean expressions, and all string and boolean constants a numeric value for arithmetic expressions. Any variables, since SPTL is a dynamically typed language, must be able to be evaluated as a boolean, number, or string, depending on the context.

We're getting closer to having a working language.

Labels: ,