LL(1) Parser

1. LL(1) Parser

LL(1) parsers are top-down parsers that read input from Left-to-right, perform Leftmost derivation, and use 1 symbol of lookahead. This makes them deterministic and efficient for real-time parsing with minimal memory.

1.1 What is LL(1) Parser?

An LL(1) parser predicts which production to use by looking only at the current input symbol and the current nonterminal.

This allows the parser to make parsing decisions without backtracking.

1.1.1 Formal Definition

A grammar is LL(1) if and only if for each pair of productions A → α | β:

FIRST(α) ∩ FIRST(β) = ∅
If ε ∈ FIRST(α), then FIRST(β) ∩ FOLLOW(A) = ∅ (and vice versa)

graph 
subgraph "LL(1)" Parser Architecture
    subgraph Input_Buffer ["Input Buffer (Tape)"]
        direction LR
        A(["a"]) --> B(["b"]) --> D(["d"]) --> End(["$"])
    end
    PT["Parse Table"]
    PA["LL(1) Parser Algorithm"]
    ST["Stack ($)"]

    End --> PA
    PA --> ST
    PA <--> PT
end

1.1.2 Real-world Analogy

Think of a ticket counter where you decide which line to stand in just by reading the signboard — no trial and error. LL(1) parsers work the same way: no backtracking.

1.2 Why is LL(1) Parser Needed?

LL(1) parsers are required when:

Efficiency is critical — they allow fast parsing with minimal overhead.
Manual parser implementation is preferred — the parser logic is easy to implement with recursive functions or a table.
Real-time applications — such as interactive code editors where immediate feedback is expected.

1.2.1 In Compiler Design

In early phases of compilation, like syntax checking for if, while, etc., LL(1) is sufficient and simple to use.

1.3 Advantages of LL(1) Parser

Simple Implementation: Uses recursive-descent or table-driven parsers, easy to write by hand.
No Backtracking: Decisions are made with only one token lookahead.
Faster Error Detection: Syntax errors can be identified early.
Suitable for Real-Time Parsing: Used in syntax-highlighting editors.

1.4 Disadvantages of LL(1) Parser

Restrictive Grammar: Cannot handle left recursion or common prefixes (must be eliminated).
Not Powerful Enough: Fails for many useful context-free grammars.
Grammar Rewriting Required: Needs left-factoring and transformation for compatibility.

1.4.1 Real-world Analogy

Imagine a GPS that only works if you provide a very simple and specific address format. Any deviation causes errors — LL(1) is that strict.

1.5 Book Example of LL(1) Grammar


E  → T E'
E' → + T E' | ε
T  → F T'
T' → * F T' | ε
F  → ( E ) | id

Predictive parser can be built for this because the FIRST and FOLLOW sets are disjoint as required by LL(1) rules.

2. Architecture of LL(1) Parser

The architecture of an LL(1) parser is designed for predictive parsing using a parsing table and a stack. It operates top-down and left-to-right, performing leftmost derivations with one lookahead symbol. This structure is simple, efficient, and suitable for hand-written parsers or applications where fast response is crucial.

2.1 Components of LL(1) Parser Architecture

Input Buffer: Contains the string to be parsed, ending with $ (end-of-input marker).
Stack: Used to track grammar symbols, initialized with $ and the start symbol on top.
Parsing Table: A 2D table indexed by nonterminals and input symbols; tells which production to apply.
Driver Program: Executes the parsing algorithm using the stack and table.
Output (Optional): Derivation steps, parse tree construction, or translated code.

2.2 Flowchart of LL(1) Parsing Algorithm

flowchart TD
    A[Start] --> B["Initialize Stack with $ and Start Symbol"]
    B --> C["Read Next Input Symbol (lookahead)"]
    C --> D{"Top of Stack = Input Symbol?"}
    
    D -- Yes --> E[Pop Stack and Advance Input]
    D -- No --> F{Top of Stack is Nonterminal?}
    
    F -- Yes --> G["Use Parsing Table M[X, a]"]
    G --> H{"M[X, a] is Production?"}
    
    H -- Yes --> I[Pop Stack and Push Production RHS]
    H -- No --> Z[Error: Invalid Table Entry]

    F -- No --> J{"Top of Stack = $ and Input = $?"}
    J -- Yes --> K[Accept Input]
    J -- No --> L[Error: Mismatch]
    
    E --> C
    I --> C

This flow mimics a leftmost derivation by matching terminals and expanding nonterminals using the parsing table entries.

3. FIRST() Function in LL(1) Parsing

The FIRST set of a grammar symbol or string tells us which terminals can appear as the first symbol in some string derived from it. It is essential for constructing LL(1) parsing tables and predicting which production to apply.

3.1 Formal Definition

Let α be any string of grammar symbols (terminals and/or nonterminals).

FIRST(α) is the set of terminals that begin strings derived from α.

If α ⇒* ε, then ε ∈ FIRST(α).

3.2 Computation Rules for FIRST

Apply the following rules until no more symbols can be added to any FIRST set:

If X is a terminal, then FIRST(X) = {X}.
If X → Y₁Y₂...Yk is a production and for some i:
- All of Y₁, ..., Y_i-1 can derive ε, and
- a ∈ FIRST(Y_i)
- Then a ∈ FIRST(X).
- If ε ∈ FIRST(Y_j) for all j = 1...k, then ε ∈ FIRST(X).
If X → ε is a production, then ε ∈ FIRST(X).

3.3 Computing FIRST(α) where α = X₁X₂...Xₙ

To compute FIRST(α):

Add all non-ε symbols from FIRST(X₁).
If ε ∈ FIRST(X₁), also add non-ε symbols of FIRST(X₂).
Continue this process...
Add ε if ε ∈ FIRST(X_i) for all i = 1 to n.

3.4 Real-world Analogy

Imagine you want to decide what to wear based on the first thing you pull out of your wardrobe. If you pull a coat, you wear it. If it’s an empty hanger (ε), you keep checking the next item. That’s how FIRST works — it predicts what’s coming by looking at the beginning.

3.5 Book Example for FIRST()


Grammar:
E  → T E'
E' → + T E' | ε
T  → F T'
T' → * F T' | ε
F  → ( E ) | id

Then:
FIRST(F)  = {(, id}
FIRST(T)  = FIRST(F) = {(, id}
FIRST(E)  = FIRST(T) = {(, id}
FIRST(E') = {+, ε}
FIRST(T') = {*, ε}

This is used to populate the parsing table by checking the current input token against the FIRST set.

3.6 Predictive Use

In an LL(1) parser, if FIRST(α) and FIRST(β) are disjoint for alternatives of a nonterminal A → α | β, then the parser can choose the correct rule just by looking at one input symbol.

4. FOLLOW() Function in LL(1) Parsing

The FOLLOW set of a nonterminal tells us what terminals can appear immediately after it in a valid derivation. This is crucial for placing productions in the parsing table, especially when the production body can derive ε (empty string).

4.1 Formal Definition

Let A be a nonterminal in grammar G.

FOLLOW(A) = set of terminals a such that there exists a derivation S ⇒ αAaβ, where S is the start symbol and a is the next terminal to appear after A.

If A can be the rightmost symbol, then include $ (end of input marker) in FOLLOW(A).

4.2 Rules to Compute FOLLOW Sets

Apply the rules iteratively until no more symbols can be added:

Start with $ ∈ FOLLOW(S) where S is the start symbol.
If A → αBβ, then FIRST(β) - {ε} ⊆ FOLLOW(B).
If A → αB or A → αBβ where FIRST(β) contains ε, then FOLLOW(A) ⊆ FOLLOW(B).

These rules account for all contexts where a nonterminal can be followed by another symbol or reach the end of a string.

4.3 Real-world Analogy

If FIRST() tells you what you might see first, FOLLOW() tells you what you're allowed to see after something is over. Like predicting what's allowed on the road just after a green signal — FOLLOW tells you what's coming next, not what started.

4.4 Book Example for FOLLOW()


Grammar:
E  → T E'
E' → + T E' | ε
T  → F T'
T' → * F T' | ε
F  → ( E ) | id

FOLLOW(E)  = FOLLOW(E') = { ), $ }
FOLLOW(T)  = FOLLOW(T') = { +, ), $ }
FOLLOW(F)  = { *, +, ), $ }

This information is used to decide which production to apply when the FIRST set contains ε and we must check FOLLOW.

4.5 Practical Use in LL(1) Table

FOLLOW helps in filling the parsing table entries when FIRST(α) contains ε in a production A → α. In this case, the production is added to all M[A, a] for a ∈ FOLLOW(A).

5. Construction of LL(1) Parse Table

The LL(1) parse table is a two-dimensional matrix M[A, a] where A is a nonterminal and a is a terminal or $ (end-of-input). The table directs which production rule to apply during parsing.

5.1 Rules for Filling the LL(1) Table

Given a grammar G with productions A → α, follow these steps for every production:

For each terminal a ∈ FIRST(α): Add the production A → α to M[A, a].
If ε ∈ FIRST(α):
- For each b ∈ FOLLOW(A), add A → α to M[A, b].
- If $ ∈ FOLLOW(A), then also add A → α to M[A, $].
If no rule is applicable: Leave M[A, a] empty (treated as error).

This ensures a deterministic choice with one lookahead symbol — the hallmark of LL(1) parsing.

5.2 Real-world Analogy

Think of a smart vending machine. You press a button (input symbol), and based on the product label (nonterminal), it knows exactly which mechanism (production rule) to trigger — unless it’s broken (error).

5.3 Example Parse Table (From Book)


Grammar:
E  → T E'
E' → + T E' | ε
T  → F T'
T' → * F T' | ε
F  → ( E ) | id

Terminals: id, +, *, (, ), $

Table M:

        id     +     *     (     )     $
E     E→TE'           E→TE'
E'          E'→+TE'        E'→ε  E'→ε
T     T→FT'           T→FT'
T'          T'→ε  T'→*FT' T'→ε  T'→ε
F     F→id         F→(E)

Notice how FIRST and FOLLOW guide exact entries, ensuring no ambiguity arises during parsing.

6. Stack Operation of LL(1) Parser

LL(1) parsing uses a stack to simulate leftmost derivations. This is called a non-recursive predictive parser. The parser pushes and pops grammar symbols based on the input token and entries in the parse table.

6.1 Key Stack Components

Stack: Initialized with $ (bottom marker) and Start Symbol on top.
Input Buffer: Contains the input string followed by $.
Parse Table: Guides which production to apply based on the top of stack and input.

6.2 Stack Operation Rules

If Top of Stack = terminal and it matches input symbol: Pop it and advance input.
If Top of Stack = terminal and it doesn’t match: Error.
If Top of Stack = nonterminal A: Consult M[A, a] in parse table.
- If M[A, a] = A → α: Pop A and push α (right-to-left).
- If M[A, a] = error: Error.
If Top of Stack = $ and input = $: Accept.

6.3 Real-World Analogy

Think of the stack as a "to-do list" for the parser. Each time you get an instruction (symbol), you check the current situation (input symbol) and expand or execute that instruction accordingly. Once it’s done, you cross it off (pop), and move to the next.

6.4 Trace Example


Grammar:
E  → T E'
E' → + T E' | ε
T  → F T'
T' → * F T' | ε
F  → ( E ) | id

Input: id + id * id $

Step     Stack          Input          Action
---------------------------------------------------
1        $ E            id + id * id $  Push E
2        $ T E'         id + id * id $  E → T E'
3        $ F T' E'      id + id * id $  T → F T'
4        $ id T' E'     id + id * id $  F → id
5        $ T' E'        + id * id $     Match id
6        $ E'           + id * id $     T' → ε
7        $ + T E'       + id * id $     E' → + T E'
8        $ T E'         id * id $       Match +
...      ...            ...             Continues

Each row simulates one action: either matching a terminal or applying a production. This continues until both input and stack reduce to $ — i.e., parsing is successful.

7. LL(1) Parse Tree

A parse tree for LL(1) parsing represents a leftmost derivation of a string from the grammar's start symbol. It is built top-down — starting from the root and applying grammar rules to grow the tree until all leaves are terminals.

7.1 Properties of LL(1) Parse Trees

Root: Start symbol of the grammar.
Internal Nodes: Nonterminals, expanded using production rules.
Leaf Nodes: Terminals or ε (empty string).
Yield: Left-to-right sequence of leaf nodes gives the parsed string.

7.2 LL(1) Parse Tree Construction Rules

Start with the start symbol at the root.
Apply productions as per the parse table entries using the lookahead token.
For each nonterminal, expand it using the RHS of its chosen production.
Repeat until all leaves are terminals or ε.

7.3 Real-World Analogy

Think of constructing a sentence by choosing the right parts of speech: subject → verb → object. You grow the sentence step-by-step, top-down, like growing branches from the trunk of a tree. Every rule you apply adds branches until the sentence is complete.

7.4 Book Example: Parse Tree for `id + id * id`

Using the grammar:


E  → T E'
E' → + T E' | ε
T  → F T'
T' → * F T' | ε
F  → ( E ) | id

The parse tree for id + id * id:

graph TD
    E --> T
    T --> F
    F --> id1[id]
    T --> Tdash1[T']
    Tdash1 --> ε1[ε]
    E --> Ep1[E']
    Ep1 --> plus[+]
    Ep1 --> T2[T]
    T2 --> F2[F]
    F2 --> id2[id]
    T2 --> Tdash2[T']
    Tdash2 --> star[*]
    Tdash2 --> F3[F]
    F3 --> id3[id]
    Tdash2 --> Tdash3[T']
    Tdash3 --> ε2[ε]
    Ep1 --> Ep2[E']
    Ep2 --> ε3[ε]

This corresponds to a leftmost derivation: E ⇒ T E' ⇒ F T' E' ⇒ id T' E' ⇒ id E' ⇒ id + T E' ⇒ id + F T' E' ⇒ id + id T' E' ⇒ id + id * F T' E' ⇒ id + id * id T' E' ⇒ ...

7.5 Parse Tree vs Derivation

Every parse tree corresponds to exactly one leftmost and one rightmost derivation. However, ambiguous grammars may lead to multiple valid parse trees for the same string. LL(1) grammars avoid this by construction.

8. Rules to Check if a Grammar is LL(1)

LL(1) grammars are those for which a predictive parser can choose the correct production using a single lookahead symbol. To ensure this, the grammar must satisfy strict disjointness and predictability rules based on FIRST and FOLLOW sets.

8.1 Formal Rules for LL(1) Grammar

Let A → α | β be two or more productions for a nonterminal A. The grammar is LL(1) if and only if:

FIRST(α) ∩ FIRST(β) = ∅
No terminal can be the first symbol derived from both α and β.
At most one of α or β can derive ε
If ε ∈ FIRST(β), then FIRST(α) ∩ FOLLOW(A) = ∅
And vice versa for α and β.

These ensure that there's no ambiguity in selecting a production for A based on the next input token.

8.2 Practical Method to Check LL(1)

Compute FIRST and FOLLOW sets for all nonterminals.
For each nonterminal with multiple productions, apply the above rules.
If any rule fails for a production pair, grammar is not LL(1).

8.3 Real-world Analogy

Imagine you're in a vending machine line. If two buttons dispense similar products and look alike (violate disjoint FIRST sets), or if one button does nothing but also overlaps with the default option (violates FOLLOW-based rule), the user can't make a reliable decision. LL(1) grammars make sure every input leads to exactly one predictable output.

8.4 Detailed Book Example: Expression Grammar


Grammar:
E  → T E'
E' → + T E' | ε
T  → F T'
T' → * F T' | ε
F  → ( E ) | id

Check for LL(1):
- For E' → + T E' and E' → ε:
  FIRST(+ T E') = {+}
  FIRST(ε) = {ε}
  FOLLOW(E') = {), $}
  → + ∉ FOLLOW(E') → OK ✅

- For T' → * F T' and T' → ε:
  FIRST(* F T') = {*}
  FIRST(ε) = {ε}
  FOLLOW(T') = {+, ), $}
  → * ∉ FOLLOW(T') → OK ✅

- All FIRST sets for same nonterminal are disjoint → LL(1) confirmed ✅

This grammar satisfies all conditions — FIRST sets are disjoint, only one ε-derivable option per rule, and FOLLOW checks pass. Hence, it's LL(1).