1. LL(1) Parser
LL(1) parsers are top-down parsers that read input from Left-to-right, perform Leftmost derivation, and use 1 symbol of lookahead. This makes them deterministic and efficient for real-time parsing with minimal memory.
1.1 What is LL(1) Parser?
An LL(1) parser predicts which production to use by looking only at the current input symbol and the current nonterminal.
This allows the parser to make parsing decisions without backtracking.
1.1.1 Formal Definition
A grammar is LL(1) if and only if for each pair of productions A → α | β
:
- FIRST(α) ∩ FIRST(β) = ∅
- If ε ∈ FIRST(α), then FIRST(β) ∩ FOLLOW(A) = ∅ (and vice versa)
graph subgraph "LL(1)" Parser Architecture subgraph Input_Buffer ["Input Buffer (Tape)"] direction LR A(["a"]) --> B(["b"]) --> D(["d"]) --> End(["$"]) end PT["Parse Table"] PA["LL(1) Parser Algorithm"] ST["Stack ($)"] End --> PA PA --> ST PA <--> PT end
1.1.2 Real-world Analogy
Think of a ticket counter where you decide which line to stand in just by reading the signboard — no trial and error. LL(1) parsers work the same way: no backtracking.
1.2 Why is LL(1) Parser Needed?
LL(1) parsers are required when:
- Efficiency is critical — they allow fast parsing with minimal overhead.
- Manual parser implementation is preferred — the parser logic is easy to implement with recursive functions or a table.
- Real-time applications — such as interactive code editors where immediate feedback is expected.
1.2.1 In Compiler Design
In early phases of compilation, like syntax checking for if
, while
, etc., LL(1) is sufficient and simple to use.
1.3 Advantages of LL(1) Parser
- Simple Implementation: Uses recursive-descent or table-driven parsers, easy to write by hand.
- No Backtracking: Decisions are made with only one token lookahead.
- Faster Error Detection: Syntax errors can be identified early.
- Suitable for Real-Time Parsing: Used in syntax-highlighting editors.
1.4 Disadvantages of LL(1) Parser
- Restrictive Grammar: Cannot handle left recursion or common prefixes (must be eliminated).
- Not Powerful Enough: Fails for many useful context-free grammars.
- Grammar Rewriting Required: Needs left-factoring and transformation for compatibility.
1.4.1 Real-world Analogy
Imagine a GPS that only works if you provide a very simple and specific address format. Any deviation causes errors — LL(1) is that strict.
1.5 Book Example of LL(1) Grammar
E → T E'
E' → + T E' | ε
T → F T'
T' → * F T' | ε
F → ( E ) | id
Predictive parser can be built for this because the FIRST and FOLLOW sets are disjoint as required by LL(1) rules.
2. Architecture of LL(1) Parser
The architecture of an LL(1) parser is designed for predictive parsing using a parsing table and a stack. It operates top-down and left-to-right, performing leftmost derivations with one lookahead symbol. This structure is simple, efficient, and suitable for hand-written parsers or applications where fast response is crucial.
2.1 Components of LL(1) Parser Architecture
- Input Buffer: Contains the string to be parsed, ending with
$
(end-of-input marker). - Stack: Used to track grammar symbols, initialized with
$
and the start symbol on top. - Parsing Table: A 2D table indexed by nonterminals and input symbols; tells which production to apply.
- Driver Program: Executes the parsing algorithm using the stack and table.
- Output (Optional): Derivation steps, parse tree construction, or translated code.
2.2 Flowchart of LL(1) Parsing Algorithm
flowchart TD A[Start] --> B["Initialize Stack with $ and Start Symbol"] B --> C["Read Next Input Symbol (lookahead)"] C --> D{"Top of Stack = Input Symbol?"} D -- Yes --> E[Pop Stack and Advance Input] D -- No --> F{Top of Stack is Nonterminal?} F -- Yes --> G["Use Parsing Table M[X, a]"] G --> H{"M[X, a] is Production?"} H -- Yes --> I[Pop Stack and Push Production RHS] H -- No --> Z[Error: Invalid Table Entry] F -- No --> J{"Top of Stack = $ and Input = $?"} J -- Yes --> K[Accept Input] J -- No --> L[Error: Mismatch] E --> C I --> C
This flow mimics a leftmost derivation by matching terminals and expanding nonterminals using the parsing table entries.
3. FIRST() Function in LL(1) Parsing
The FIRST set of a grammar symbol or string tells us which terminals can appear as the first symbol in some string derived from it. It is essential for constructing LL(1) parsing tables and predicting which production to apply.
3.1 Formal Definition
Let α
be any string of grammar symbols (terminals and/or nonterminals).
FIRST(α) is the set of terminals that begin strings derived from α
.
- If
α ⇒* ε
, thenε ∈ FIRST(α)
.
3.2 Computation Rules for FIRST
Apply the following rules until no more symbols can be added to any FIRST set:
- If
X
is a terminal, thenFIRST(X) = {X}
. - If
X → Y₁Y₂...Yk
is a production and for somei
:- All of
Y₁, ..., Yi-1
can deriveε
, and a ∈ FIRST(Yi)
- Then
a ∈ FIRST(X)
. - If
ε ∈ FIRST(Yj)
for allj = 1...k
, thenε ∈ FIRST(X)
.
- All of
- If
X → ε
is a production, thenε ∈ FIRST(X)
.
3.3 Computing FIRST(α) where α = X₁X₂...Xₙ
To compute FIRST(α)
:
- Add all non-
ε
symbols fromFIRST(X₁)
. - If
ε ∈ FIRST(X₁)
, also add non-ε
symbols ofFIRST(X₂)
. - Continue this process...
- Add
ε
ifε ∈ FIRST(Xi)
for alli = 1 to n
.
3.4 Real-world Analogy
Imagine you want to decide what to wear based on the first thing you pull out of your wardrobe. If you pull a coat, you wear it. If it’s an empty hanger (ε), you keep checking the next item. That’s how FIRST works — it predicts what’s coming by looking at the beginning.
3.5 Book Example for FIRST()
Grammar:
E → T E'
E' → + T E' | ε
T → F T'
T' → * F T' | ε
F → ( E ) | id
Then:
FIRST(F) = {(, id}
FIRST(T) = FIRST(F) = {(, id}
FIRST(E) = FIRST(T) = {(, id}
FIRST(E') = {+, ε}
FIRST(T') = {*, ε}
This is used to populate the parsing table by checking the current input token against the FIRST set.
3.6 Predictive Use
In an LL(1) parser, if FIRST(α)
and FIRST(β)
are disjoint for alternatives of a nonterminal A → α | β
, then the parser can choose the correct rule just by looking at one input symbol.
4. FOLLOW() Function in LL(1) Parsing
The FOLLOW set of a nonterminal tells us what terminals can appear immediately after it in a valid derivation. This is crucial for placing productions in the parsing table, especially when the production body can derive ε (empty string).
4.1 Formal Definition
Let A
be a nonterminal in grammar G
.
FOLLOW(A) = set of terminals a
such that there exists a derivation S ⇒ αAaβ
, where S
is the start symbol and a
is the next terminal to appear after A
.
- If A can be the rightmost symbol, then include
$
(end of input marker) inFOLLOW(A)
.
4.2 Rules to Compute FOLLOW Sets
Apply the rules iteratively until no more symbols can be added:
- Start with
$ ∈ FOLLOW(S)
whereS
is the start symbol. - If
A → αBβ
, thenFIRST(β) - {ε} ⊆ FOLLOW(B)
. - If
A → αB
orA → αBβ
whereFIRST(β)
containsε
, thenFOLLOW(A) ⊆ FOLLOW(B)
.
These rules account for all contexts where a nonterminal can be followed by another symbol or reach the end of a string.
4.3 Real-world Analogy
If FIRST()
tells you what you might see first, FOLLOW()
tells you what you're allowed to see after something is over. Like predicting what's allowed on the road just after a green signal — FOLLOW tells you what's coming next, not what started.
4.4 Book Example for FOLLOW()
Grammar:
E → T E'
E' → + T E' | ε
T → F T'
T' → * F T' | ε
F → ( E ) | id
FOLLOW(E) = FOLLOW(E') = { ), $ }
FOLLOW(T) = FOLLOW(T') = { +, ), $ }
FOLLOW(F) = { *, +, ), $ }
This information is used to decide which production to apply when the FIRST set contains ε and we must check FOLLOW.
4.5 Practical Use in LL(1) Table
FOLLOW helps in filling the parsing table entries when FIRST(α)
contains ε
in a production A → α
. In this case, the production is added to all M[A, a]
for a ∈ FOLLOW(A)
.
5. Construction of LL(1) Parse Table
The LL(1) parse table is a two-dimensional matrix M[A, a]
where A
is a nonterminal and a
is a terminal or $
(end-of-input). The table directs which production rule to apply during parsing.
5.1 Rules for Filling the LL(1) Table
Given a grammar G
with productions A → α
, follow these steps for every production:
- For each terminal
a ∈ FIRST(α)
: Add the productionA → α
toM[A, a]
. - If ε ∈ FIRST(α):
- For each
b ∈ FOLLOW(A)
, addA → α
toM[A, b]
. - If
$ ∈ FOLLOW(A)
, then also addA → α
toM[A, $]
.
- For each
- If no rule is applicable: Leave
M[A, a]
empty (treated as error).
This ensures a deterministic choice with one lookahead symbol — the hallmark of LL(1) parsing.
5.2 Real-world Analogy
Think of a smart vending machine. You press a button (input symbol), and based on the product label (nonterminal), it knows exactly which mechanism (production rule) to trigger — unless it’s broken (error).
5.3 Example Parse Table (From Book)
Grammar:
E → T E'
E' → + T E' | ε
T → F T'
T' → * F T' | ε
F → ( E ) | id
Terminals: id, +, *, (, ), $
Table M:
id + * ( ) $
E E→TE' E→TE'
E' E'→+TE' E'→ε E'→ε
T T→FT' T→FT'
T' T'→ε T'→*FT' T'→ε T'→ε
F F→id F→(E)
Notice how FIRST
and FOLLOW
guide exact entries, ensuring no ambiguity arises during parsing.
6. Stack Operation of LL(1) Parser
LL(1) parsing uses a stack to simulate leftmost derivations. This is called a non-recursive predictive parser. The parser pushes and pops grammar symbols based on the input token and entries in the parse table.
6.1 Key Stack Components
- Stack: Initialized with
$
(bottom marker) andStart Symbol
on top. - Input Buffer: Contains the input string followed by
$
. - Parse Table: Guides which production to apply based on the top of stack and input.
6.2 Stack Operation Rules
- If Top of Stack = terminal and it matches input symbol: Pop it and advance input.
- If Top of Stack = terminal and it doesn’t match: Error.
- If Top of Stack = nonterminal A: Consult M[A, a] in parse table.
- If M[A, a] = A → α: Pop A and push α (right-to-left).
- If M[A, a] = error: Error.
- If Top of Stack = $ and input = $: Accept.
6.3 Real-World Analogy
Think of the stack as a "to-do list" for the parser. Each time you get an instruction (symbol), you check the current situation (input symbol) and expand or execute that instruction accordingly. Once it’s done, you cross it off (pop), and move to the next.
6.4 Trace Example
Grammar:
E → T E'
E' → + T E' | ε
T → F T'
T' → * F T' | ε
F → ( E ) | id
Input: id + id * id $
Step Stack Input Action
---------------------------------------------------
1 $ E id + id * id $ Push E
2 $ T E' id + id * id $ E → T E'
3 $ F T' E' id + id * id $ T → F T'
4 $ id T' E' id + id * id $ F → id
5 $ T' E' + id * id $ Match id
6 $ E' + id * id $ T' → ε
7 $ + T E' + id * id $ E' → + T E'
8 $ T E' id * id $ Match +
... ... ... Continues
Each row simulates one action: either matching a terminal or applying a production. This continues until both input and stack reduce to $
— i.e., parsing is successful.
7. LL(1) Parse Tree
A parse tree for LL(1) parsing represents a leftmost derivation of a string from the grammar's start symbol. It is built top-down — starting from the root and applying grammar rules to grow the tree until all leaves are terminals.
7.1 Properties of LL(1) Parse Trees
- Root: Start symbol of the grammar.
- Internal Nodes: Nonterminals, expanded using production rules.
- Leaf Nodes: Terminals or ε (empty string).
- Yield: Left-to-right sequence of leaf nodes gives the parsed string.
7.2 LL(1) Parse Tree Construction Rules
- Start with the start symbol at the root.
- Apply productions as per the parse table entries using the lookahead token.
- For each nonterminal, expand it using the RHS of its chosen production.
- Repeat until all leaves are terminals or ε.
7.3 Real-World Analogy
Think of constructing a sentence by choosing the right parts of speech: subject → verb → object. You grow the sentence step-by-step, top-down, like growing branches from the trunk of a tree. Every rule you apply adds branches until the sentence is complete.
7.4 Book Example: Parse Tree for id + id * id
Using the grammar:
E → T E'
E' → + T E' | ε
T → F T'
T' → * F T' | ε
F → ( E ) | id
The parse tree for id + id * id
:
graph TD E --> T T --> F F --> id1[id] T --> Tdash1[T'] Tdash1 --> ε1[ε] E --> Ep1[E'] Ep1 --> plus[+] Ep1 --> T2[T] T2 --> F2[F] F2 --> id2[id] T2 --> Tdash2[T'] Tdash2 --> star[*] Tdash2 --> F3[F] F3 --> id3[id] Tdash2 --> Tdash3[T'] Tdash3 --> ε2[ε] Ep1 --> Ep2[E'] Ep2 --> ε3[ε]
This corresponds to a leftmost derivation: E ⇒ T E' ⇒ F T' E' ⇒ id T' E' ⇒ id E' ⇒ id + T E' ⇒ id + F T' E' ⇒ id + id T' E' ⇒ id + id * F T' E' ⇒ id + id * id T' E' ⇒ ...
7.5 Parse Tree vs Derivation
Every parse tree corresponds to exactly one leftmost and one rightmost derivation. However, ambiguous grammars may lead to multiple valid parse trees for the same string. LL(1) grammars avoid this by construction.
8. Rules to Check if a Grammar is LL(1)
LL(1) grammars are those for which a predictive parser can choose the correct production using a single lookahead symbol. To ensure this, the grammar must satisfy strict disjointness and predictability rules based on FIRST
and FOLLOW
sets.
8.1 Formal Rules for LL(1) Grammar
Let A → α | β
be two or more productions for a nonterminal A
. The grammar is LL(1) if and only if:
- FIRST(α) ∩ FIRST(β) = ∅
No terminal can be the first symbol derived from both α and β. - At most one of α or β can derive ε
- If ε ∈ FIRST(β), then
FIRST(α) ∩ FOLLOW(A) = ∅
And vice versa for α and β.
These ensure that there's no ambiguity in selecting a production for A
based on the next input token.
8.2 Practical Method to Check LL(1)
- Compute FIRST and FOLLOW sets for all nonterminals.
- For each nonterminal with multiple productions, apply the above rules.
- If any rule fails for a production pair, grammar is not LL(1).
8.3 Real-world Analogy
Imagine you're in a vending machine line. If two buttons dispense similar products and look alike (violate disjoint FIRST sets), or if one button does nothing but also overlaps with the default option (violates FOLLOW-based rule), the user can't make a reliable decision. LL(1) grammars make sure every input leads to exactly one predictable output.
8.4 Detailed Book Example: Expression Grammar
Grammar:
E → T E'
E' → + T E' | ε
T → F T'
T' → * F T' | ε
F → ( E ) | id
Check for LL(1):
- For E' → + T E' and E' → ε:
FIRST(+ T E') = {+}
FIRST(ε) = {ε}
FOLLOW(E') = {), $}
→ + ∉ FOLLOW(E') → OK ✅
- For T' → * F T' and T' → ε:
FIRST(* F T') = {*}
FIRST(ε) = {ε}
FOLLOW(T') = {+, ), $}
→ * ∉ FOLLOW(T') → OK ✅
- All FIRST sets for same nonterminal are disjoint → LL(1) confirmed ✅
This grammar satisfies all conditions — FIRST sets are disjoint, only one ε-derivable option per rule, and FOLLOW checks pass. Hence, it's LL(1).