Recursive Grammars
A recursive grammar is a grammar where a non-terminal calls itself directly or indirectly in its production rules.
It's like a function calling itself to repeat a pattern.
Why recursion?
Recursion in grammar helps us represent repetition – like a list, nested expressions, or loops.
Types of Recursion:
1. Left Recursion ❌ (bad for LL(1))
A → A α | β
Here, A
appears on the leftmost side → leads to infinite loop in Top-Down Parsers
Example:
E → E + T | T
2. Right Recursion ✅ (safe for LL(1))
A → β A | ε
Or,
A → a A | a
Here, recursion happens on the right side. Works well with LL(1).
Example:
L → id , L | id
3. Indirect Recursion
A → B α
B → A β
Still recursive, but through another non-terminal.
How to Handle Recursive Grammars?
Type | Action Needed |
---|---|
Left Recursion | Remove it |
Right Recursion | Keep it, LL(1) friendly |
Indirect | Rewrite to remove left recursion |
Remove Left Recursion
Given:
A → A α | β
Convert to:
A → β A'
A' → α A' | ε
Example:
Expr → Expr + Term | Term
→ becomes:
Expr → Term Expr'
Expr' → + Term Expr' | ε
Summary:
- Recursive Grammar = Grammar that repeats patterns using itself
- Left Recursion = bad for LL(1) → needs fixing
- Right Recursion = good for LL(1)
- Needed for things like lists, loops, expressions
Think of it like a mirror reflecting itself – grammar using its own name to define a repeating structure.
Left Recursive Grammar
A Left Recursive Grammar is when a non-terminal calls itself on the leftmost side of its production.
Rule:
A grammar is left recursive if:
A → A α
Here, A
appears again at the start of the right-hand side.
Why it's a problem?
Top-Down Parsers like LL(1) use recursion.
So left recursion causes infinite loops.
Example:
E → E + T | T
Trying to parse E
→ again E
→ again E
→ 💥 infinite recursion!
How to Fix It?
We remove left recursion using this method:
Given:
A → A α | β
Convert to:
A → β A'
A' → α A' | ε
Example:
Original (Left Recursive):
E → E + T | T
Fixed (No Left Recursion):
E → T E'
E' → + T E' | ε
Now it's LL(1)-friendly ✅
Indirect Left Recursion:
Sometimes recursion goes through other non-terminals:
A → B α
B → A β
Still left recursive, must be detected and removed.
Summary:
- Left Recursive:
A → A α
- Causes infinite loop in top-down parsers
- Must be removed for LL(1)
- Replace with right recursion using helper non-terminal (
A'
)
Think of it like a person stuck in a loop talking to themselves – we break the loop by changing the sentence structure.
Right Recursive Grammar
A Right Recursive Grammar is when a non-terminal calls itself on the rightmost side of its production.
Rule:
A grammar is right recursive if:
A → α A
Here, the non-terminal A
comes at the end of the right-hand side.
Why it's GOOD?
- Works perfectly with Top-Down Parsers (like LL(1))
- No infinite loop
- Safe and easy to implement
Example:
L → id , L | id
This generates:
id
id , id
id , id , id
It's right-recursive because L
calls itself at the end → id , L
How it Parses?
For input: id , id , id
It builds tree from left to right, like a queue.
Limitation (for Bottom-Up):
- In Bottom-Up Parsers, right recursion can cause deep stacks.
- For very long expressions, it might be inefficient.
Summary:
- Right Recursive:
A → α A
- Safe for LL(1)
- Best for lists, sequences like:
id, id, id
- Preferred in top-down parsing
Think of it like a train adding coaches one by one at the end – smooth, expandable, and predictable.
Non-Determinism in Grammar
Non-Determinism means the parser can't decide which rule to apply just by looking at the next input symbol.
What It Is:
A grammar is non-deterministic if for a non-terminal A
:
A → α | β
and
FIRST(α) ∩ FIRST(β) ≠ ∅
→ So the parser gets confused which rule to choose with one lookahead.
Example:
S → i E t S | i E t S e S | a
Start with i
→ Which rule?
Both start with i E t S
→ Parser is confused: should I use rule 1 or rule 2?
This is non-determinism.
Why It's a Problem:
- LL(1) parser needs exact one rule per input symbol
- Non-deterministic grammar needs backtracking or more lookahead
How to Fix It?
→ Left Factoring!
Example:
S → i E t S | i E t S e S
Becomes:
S → i E t S S'
S' → e S | ε
Now:
- On input
i
→ only 1 choice - After
t S
, parser waits fore
or end
✅ Now it's deterministic
Summary:
- Non-Determinism = Parser can't decide next rule with 1 lookahead
- Happens when multiple rules start similarly
- Fix using Left Factoring
- LL(1) requires deterministic grammar
Think of it like a fork in the road where signs are missing — parser doesn't know which path to take unless you clarify.
Determinism in Grammar
Determinism means the parser can always decide what to do next using only 1 lookahead symbol — no guessing, no confusion.
What It Is:
A grammar is deterministic if:
- For any non-terminal
A
with rulesA → α | β
, FIRST(α) ∩ FIRST(β) = ∅
Also, if ε ∈ FIRST(α)
, then:
FIRST(β) ∩ FOLLOW(A) = ∅
Example (Deterministic):
S → a A | b B
A → x
B → y
a
leads toA
b
leads toB
- → No confusion → Deterministic ✅
Counter Example (Non-Deterministic):
S → a A | a B
Both rules start with a
→ Parser can't decide
→ Non-Deterministic ❌
How to Ensure Determinism:
- Remove left recursion
- Left factor the grammar
- Use FIRST and FOLLOW sets to check
Summary:
- Determinism = Parser knows exactly what to do next
- No backtracking, no multiple choices
- LL(1) parsers need deterministic grammars
- Ensures fast and correct parsing
Think of it like a GPS with only one correct route — no confusion, just follow the sign.
Left Factoring
Left Factoring is a technique to remove confusion when two or more productions start the same — making the grammar deterministic and LL(1)-friendly.
Why do it?
When a parser sees:
A → α β1 | α β2
It gets confused after reading α
.
Which rule to pick? → Parser can't decide
❌ Non-Deterministic
Fix using Left Factoring:
Take common part α
out:
A → α A'
A' → β1 | β2
Now parser sees α
, picks A → α A'
Then decides between β1
and β2
later ✅ Deterministic
Example:
Original (Non-LL(1)):
S → if E then S else S | if E then S
Parser sees if
— gets confused
Left Factored:
S → if E then S S'
S' → else S | ε
Now only one rule for if
, rest handled later ✅
Another Example:
A → int id ; | int id = num ;
Both start with int id
Left Factored:
A → int id A'
A' → ; | = num ;
✅ LL(1)-friendly
Summary:
- Left Factoring removes common prefixes
- Helps parser choose only 1 rule with 1 token
- Makes grammar deterministic
- Essential for LL(1) Parsing
Think of it like grouping common words in two sentences to remove repetition and confusion.
Parser
Definition:
A Parser is a program that takes a sequence of tokens (from a lexer) and checks if they follow the rules of a programming language and builds a structure (like a tree) from it.
Simple Explanation
Imagine you're checking if a sentence in English makes sense.
- ✅ "I eat apples." – Correct.
- ❌ "Eat I apples." – Wrong grammar.
Similarly, in programming:
- ✅
if (x > 0) { y = x; }
– Correct. - ❌
if x > 0 { y = x; }
– Wrong syntax.
A Parser reads this and checks grammar rules of programming (defined by a CFG – Context Free Grammar). If the code is correct, it builds a parse tree (like a diagram) showing how the sentence follows the rules.
Parser Works in 2 Phases:
- Syntax Analyzer – Checks rules.
- Parse Tree Generator – Builds tree.
Types of Parsers (Easy):
Type | How it Works | Example |
---|---|---|
Top-Down Parser | Starts from the top rule & goes down. | LL(1) |
Bottom-Up Parser | Starts from input & goes up to rules. | LR(0), SLR, CLR, LALR |
Real-Life Example:
You type:
int a = b + 2;
1. Lexer: Breaks into tokens → int
, a
, =
, b
, +
, 2
, ;
2. Parser: Says → "Hmm... this matches the rule: declaration → type id = expression;" ✔️
If you typed:
int = b + 2 a;
The parser says: ❌ Error! "This does NOT follow the grammar rules."
Output of Parser:
A Parse Tree or Syntax Tree like:
.
=
/ \
a +
/ \
b 2
This is then used by the next compiler stages like semantic analysis and code generation.
Final Words:
- Parser = Grammar Checker of code.
- It reads tokens, follows grammar, and builds a tree.
- If rules are violated, it throws errors.
You can think of it as the "English teacher of programming languages."
Architecture of a Parser
Think of a parser like a traffic inspector. Tokens (words) from the lexer are like cars coming in — the parser checks if they are in the right order and follow rules.
Basic Architecture:
+------------------+
| Source Code |
+--------+---------+
|
v
+------------------+
| Lexer | → Breaks into tokens
+--------+---------+
|
v
+------------------+
| Parser | → Builds structure from tokens
+--------+---------+
|
+--------+--------+
| Parse Tree |
+-----------------+
Inside the Parser:
The parser has two main parts:
1. Grammar Rules (CFG)
- Written by compiler designer.
- Example:
E → E + T | T
2. Parsing Algorithm
- Takes tokens + grammar, decides:
- Valid or invalid?
- If valid → build tree.
- If invalid → show error.
Components of Parser:
Component | Function |
---|---|
Input Buffer | Holds tokens from lexer. |
Parsing Table | Guides decisions (for LL or LR parsers). |
Stack | Holds symbols during parsing. |
Parse Tree Builder | Constructs tree step-by-step. |
Error Handler | Detects + manages syntax errors. |
Control Flow:
Tokens → [Parser Stack + Parsing Table] → Actions:
↳ Match, Expand rule, Shift, Reduce
↳ Build Parse Tree
↳ Throw Error (if rule mismatch)
Example:
Input code:
x = a + b * c;
Steps:
- Lexer gives tokens:
id = id + id * id ;
- Parser checks if they fit grammar like:
E → E + T → T * F ...
- Builds parse tree accordingly.
Output:
- Parse Tree (full structure)
- Or Abstract Syntax Tree (AST) (cleaned-up version)
- If error → Error Message with line/column info
In short:
Parser = Brain of compiler that:
- Takes tokens.
- Checks grammar rules.
- Builds structure.
- Reports error if syntax is broken.
Top Down Parser
Imagine you're solving a puzzle starting from the big picture and filling in the pieces.
A Top-Down Parser starts from the start symbol (like "S") and tries to rewrite it into the input string using grammar rules.
What it does:
It tries to predict the structure of the input from the top of the parse tree to the bottom.
Steps:
- Start with Start Symbol (e.g.
S
) - Apply production rules to expand it
- Match input tokens from left to right
- Build the parse tree from top → down
Example:
Grammar:
E → T + E | T
T → id
Input: id + id
Parser does:
- Start with
E
E → T + E
T → id
✅+
✅E → T → id
✅
✔️ Matched!
Types of Top Down Parsers:
Type | Key Idea |
---|---|
Recursive Descent | One function per rule. Simple, hand-coded. |
Predictive Parser (LL(1)) | Uses table and one symbol lookahead. |
Recursive Descent Parser:
Each rule is a function.
Example for E → T + E | T
:
void E() {
T();
if (nextToken == '+') {
match('+');
E();
}
}
LL(1) Parser:
- L: Left to right input
- L: Leftmost derivation
- 1: 1 token lookahead
- Uses Parsing Table to decide which rule to apply
- More efficient, table-driven
Problems in Top Down Parsing:
Problem | Reason | Fix |
---|---|---|
Left Recursion | e.g. A → Aα | Remove or modify |
Backtracking | Parser guesses wrong rule | Predictive parsing (LL(1)) avoids this |
Final Output:
A Parse Tree from start to finish, top to bottom.
Summary:
- Top Down Parser starts from start symbol, tries to rewrite it to match input.
- Builds parse tree from top to down.
- Easier to understand and implement.
- Can be recursive (simple) or predictive (table-based).
Bottom-Up Parser
Think of building a LEGO house by starting with blocks at the bottom, then joining them up to form bigger parts, finally forming the complete house (start symbol).
What it does:
Bottom-Up Parser starts from the input (leaves) and tries to reach the Start Symbol by reducing the string using grammar rules.
Steps:
- Start with input tokens (e.g.
id + id * id
) - Match small parts → reduce them to grammar variables
- Keep reducing until we get the Start Symbol (like
E
) - This gives a parse tree from bottom to top
Example:
Grammar:
E → E + T | T
T → T * F | F
F → id
Input: id + id * id
Bottom-up Parser:
id → F
F → T
id → F
F → T
T * T → T
T + T → E
✅
Types of Bottom-Up Parsers:
Parser Type | Meaning |
---|---|
LR(0) | Simple, base version |
SLR(1) | Simple LR, 1 lookahead |
LALR(1) | Look-Ahead LR |
CLR(1) | Canonical LR |
These use parsing tables and shift-reduce stack.
Actions of Bottom-Up Parser:
Action | Meaning |
---|---|
Shift | Push input symbol to stack |
Reduce | Replace top of stack using grammar |
Accept | Input matched correctly |
Error | Syntax error |
Why Compiler Prefers Bottom-Up Parsers:
Reason | Explanation |
---|---|
✅ Handles all CFGs | Even complex ones with left recursion |
✅ No backtracking | Fully table-driven, fast & efficient |
✅ More Powerful | Can parse more types of languages |
✅ Used in YACC/Bison | Industry tools like C/C++ compilers use it |
✅ Error Detection is Better | Finds syntax errors closer to actual issue |
Summary:
- Bottom-Up Parser builds tree from leaves to root
- Uses shift & reduce actions with a stack
- Preferred by compilers because:
- More powerful
- Handles complex grammars
- Faster and accurate
- Used in real-life tools like YACC, Bison
Think of it like a machine assembling code from smallest parts to the full program structure.
Recursive Descent Parser
A Recursive Descent Parser is like a team of functions where each function handles a grammar rule and calls others recursively to process input tokens.
What is it?
- A Top-Down Parser
- Manually written
- Uses one function per non-terminal
- Works like:
“If I seeE
, I call functionE()
. If I seeT
, I callT()
.”
Architecture:
+-----------------+
| Source Program |
+-----------------+
|
v
+-----------------+
| Lexer | → Gives tokens
+-----------------+
|
v
+-----------------+
| Recursive Descent|
| Parser |
| (set of funcs) |
+-----------------+
|
v
+-----------------+
| Parse Tree |
+-----------------+
How it Works:
For grammar like:
E → T E'
E' → + T E' | ε
T → F T'
T' → * F T' | ε
F → (E) | id
Functions:
void E() {
T();
Eprime();
}
void Eprime() {
if (lookahead == '+') {
match('+');
T();
Eprime();
}
}
void T() {
F();
Tprime();
}
// and so on...
Each function:
- Checks the expected token
- Calls other functions based on grammar
Technical Points:
Feature | Description |
---|---|
LL(1) Support | Left to Right, Leftmost derivation, 1 lookahead |
No Table | Works without parsing table |
Hand-Coded | You write each function manually |
Backtracking (optional) | Some versions support guessing rules |
Simple & Intuitive | Easy to implement and debug |
Drawbacks:
Issue | Fix |
---|---|
Left Recursion | Must be removed manually |
Not for complex grammar | Better to use LL(1) or LR parsers |
Example of Left Recursion (bad):
E → E + T
Fixed to:
E → T E'
E' → + T E' | ε
Summary:
- Recursive Descent Parser = Set of functions for grammar
- Works top-down, recursively
- Great for simple grammars
- Needs grammar to be non-left-recursive
- Used in small compilers, educational tools
Imagine it as a conversation between functions, each one asking the next: “Do you match the grammar?”
LL(1) Parser
LL(1) Parser is a Top-Down, Table-Driven Parser.
It's smarter than Recursive Descent – instead of guessing, it looks ahead 1 token and uses a table to decide what to do.
What “LL(1)” means:
- L → Left-to-right input
- L → Leftmost derivation
- 1 → One lookahead token
🏗️ Architecture:
+-------------------+
| Input Tokens |
+--------+----------+
|
v
+-------------------+ +------------------+
| Parsing Stack |<--->| Parsing Table |
+--------+----------+ +--------+---------+
| ^
v |
+-------------------+ +--------+---------+
| LL(1) Parser |----| Grammar & First/Follow |
+-------------------+ +------------------+
|
v
+-------------------+
| Parse Tree |
+-------------------+
Internal Working:
- Stack holds symbols to be matched
- Input is read left-to-right
- Parser uses LL(1) Parsing Table to decide:
- Which rule to apply based on top of stack and current input
- Actions:
- Match (if terminal)
- Replace (if non-terminal)
- Accept if success
- Error if mismatch
Steps Example:
Grammar:
E → T E'
E' → + T E' | ε
T → id
Input: id + id
Parsing Table:
id | + | $ | |
---|---|---|---|
E | E→T E' | ||
E' | E'→+T E' | E'→ε | |
T | T→id |
Stack: E $
Input: id + id $
- Top = E, Input = id → use E → T E'
- Replace E with T E'
- Top = T, Input = id → use T → id
- Match id
- Top = E', Input = + → use E' → + T E'
- And so on...
Key Requirements:
Requirement | Meaning |
---|---|
No Left Recursion | e.g. E → E + T must be removed |
Left Factored | e.g. A → ab | ac becomes A → aB, B → b | c |
First & Follow Sets | Needed to build table |
One Rule per Cell | LL(1) must be non-ambiguous |
Advantages:
- Fast and table-driven
- No backtracking
- Clear and predictable
Limitations:
- Can't handle all grammars
- Needs pre-processing (remove left recursion, left factoring)
- Only 1 lookahead = less powerful than LR parsers
Summary:
- LL(1) Parser = Table + Stack based Top-Down Parser
- Uses 1 lookahead to choose production rule
- Requires First & Follow sets to build the parsing table
- Efficient but needs grammar to be clean and simple
It's like a Google Map with one arrow – choose the path by just 1 lookahead.
FIRST() in Compiler Design
FIRST(X) tells:
"What terminals can appear first if I start deriving from X?"
Why it matters:
When building LL(1) parsing tables, we need to know:
- If I have
A → α
, and input symbol isa
, should I use this rule? - Check if
a ∈ FIRST(α)
Rules to Find FIRST():
Let X
be a symbol (Terminal or Non-Terminal)
Case 1: If X is a terminal
FIRST(X) = { X }
Case 2: If X is a non-terminal and:
X → a... ⇒ FIRST(X) = { a }
Case 3: X → Y₁ Y₂ Y₃...
- Add FIRST(Y₁) to FIRST(X)
- If FIRST(Y₁) has ε (epsilon), → check Y₂, and so on...
Case 4: X → ε
Then ε ∈ FIRST(X)
Example:
Grammar:
E → T E'
E' → + T E' | ε
T → id
Now calculate:
Non-Terminal | FIRST() |
---|---|
E | FIRST(T) = { id } |
E' | { +, ε } |
T | { id } |
Simple Algorithm:
Repeat until no more changes:
For every rule A → α:
Add FIRST(α) to FIRST(A)
If α is Y₁ Y₂ ... and FIRST(Yᵢ) has ε
Then check Yᵢ₊₁
Final Notes:
- FIRST() helps us predict the next move in LL(1) parser
- Very useful to build parsing table
ε
(empty string) is included only if whole production can become empty
Think like this:
FIRST(X) = "What can I possibly see first if I start writing X on paper?"
It helps the parser look ahead and decide.
FOLLOW() in Compiler Design
FOLLOW(A) tells:
👉 “What terminals can come right after A
in some derivation?”
Why it matters:
Used in LL(1) Parsing Table to decide:
- If a production like
A → ε
should be used - Helps fill entries when FIRST(α) has
ε
Rules to Calculate FOLLOW():
Let's say grammar has a Start Symbol S
.
Rule 1:
If A is the start symbol, add $ to FOLLOW(A)
Rule 2:
For every production A → α B β
Add FIRST(β) (excluding ε) to FOLLOW(B)
Rule 3:
If A → α B
or A → α B β
and FIRST(β) has ε:
Add FOLLOW(A) to FOLLOW(B)
Example:
Grammar:
E → T E'
E' → + T E' | ε
T → id
Start symbol = E
We calculate:
Step 1:
FOLLOW(E) = { $ }
(Rule 1)
Step 2:
E → T E'
E'
follows T
→ so FOLLOW(T) = FIRST(E')
= { +, ε } (but only if E'
can be empty, we add FOLLOW(E)
)
Step 3:
E' → + T E'
FOLLOW(T)
= FIRST(E'
) again
E' → ε
→ FOLLOW(E)
goes to E'
too
Non-Terminal | FOLLOW() |
---|---|
E | { $ } |
E' | { $ } |
T | { +, $ } |
Simple Algorithm:
1. Start with FOLLOW(S) = { $ }
2. For every production A → α B β:
a. Add FIRST(β) (excluding ε) to FOLLOW(B)
b. If FIRST(β) has ε or β is empty:
Add FOLLOW(A) to FOLLOW(B)
3. Repeat until no change
Final Summary:
- FOLLOW(A) = All terminals that can come immediately after A
- Helps in LL(1) table, especially when rule derives
ε
- Always put
$
(end of input) in FOLLOW(start symbol)
FOLLOW(A) = “What can I expect to see after A, if I'm parsing from left to right?”
Parse Table (LL(1))
A Parse Table tells the LL(1) parser which rule to apply based on:
- Current non-terminal (on stack)
- Current input token (lookahead)
Structure:
It's a 2D Table like this:
id | + | * | ( | ) | $ | |
---|---|---|---|---|---|---|
E | E→T E' | E→T E' | ||||
E' | E'→+T E' | E'→ε | E'→ε | |||
T | T→id | T→(E) | ||||
T' | T'→ε | T'→*F T' | T'→ε | T'→ε | ||
F | F→id | F→(E) |
How to Build Parse Table:
- For each production
A → α
:- For every terminal
a
in FIRST(α), addA → α
toM[A, a]
- If
ε ∈ FIRST(α)
, then for everyb
in FOLLOW(A), addA → α
toM[A, b]
- For every terminal
Example Grammar:
E → T E'
E' → + T E' | ε
T → F T'
T' → * F T' | ε
F → id | ( E )
FIRST & FOLLOW:
Symbol | FIRST | FOLLOW |
---|---|---|
E | { id, ( } | { $, ) } |
E' | { +, ε } | { $, ) } |
T | { id, ( } | { +, $, ) } |
T' | { *, ε } | { +, $, ) } |
F | { id, ( } | { *, +, $, ) } |
How Parser Uses It:
- Stack top =
A
, input =a
- Go to cell
M[A, a]
→ pick the rule → expand stack - Repeat until:
- Input & stack both become
$
→ Accepted - No entry found → Error
- Input & stack both become
Summary:
- Parse Table = Guide Map for LL(1) Parser
- Built using FIRST & FOLLOW
- Each cell has only one rule (else → not LL(1))
- Table-driven → Fast, Predictive, No backtracking
Think of it like a cheat sheet for the parser to decide its next move confidently.
Stack Operation of LL(1) Parser
The LL(1) Parser uses a stack to match grammar rules with input tokens using a parse table.
What's in the Stack?
- Non-terminals to be expanded
- Terminals to be matched
- Ends with $ (end marker)
Parser Steps:
- Initialize stack:
[$, StartSymbol]
- Read current input token
- Check Top of Stack:
- If terminal → match with input → pop both
- If non-terminal → use parse table to get rule → replace top
- If both stack & input are
$
→ ACCEPT
Example:
Grammar:
E → T E'
E' → + T E' | ε
T → id
Input: id + id $
Start Symbol: E
Step-by-step Stack:
Stack | Input | Action |
---|---|---|
$ E | id + id $ | E → T E' |
$ E' T | id + id $ | T → id |
$ E' id | id + id $ | match id |
$ E' | + id $ | E' → + T E' |
$ E' T + | + id $ | match + |
$ E' T | id $ | T → id |
$ E' id | id $ | match id |
$ E' | $ | E' → ε (empty) |
$ | $ | ACCEPT ✅ |
Parse Tree:
We use the rules applied to build a parse tree.
.
E
/ \
T E'
| / \
id + T
|
id
- Each non-terminal expands into its RHS
- Each match goes to a leaf node
Summary:
- LL(1) Stack = Holds what's left to match/expand
- Stack + Input are read together
- Parse Table tells which rule to apply
- Parse Tree is built from applied rules, top-down
Think of it like a puzzle solver using rules from a cheat-sheet (table) to fit input tokens into a grammar structure.
Rules to Check if a Grammar is LL(1)
To confirm a grammar is LL(1), we must ensure the parser can decide the correct production with just 1 lookahead.
🧪 Rule 1: No Left Recursion
If a grammar has left recursion, it's not LL(1).
❌ Example (Left Recursive):
A → Aα | β
✅ Fix:
A → βA'
A' → αA' | ε
Rule 2: Left Factoring
If multiple rules of a non-terminal start the same, it's ambiguous for LL(1).
❌ Example:
A → aB | aC
✅ Fix:
A → aA'
A' → B | C
Rule 3: FIRST Rule (no FIRST/FIRST conflict)
For a non-terminal A
with multiple rules:
A → α | β
Then:
FIRST(α) ∩ FIRST(β) = ∅
→ Means: no common terminal should appear in both FIRST sets.
Rule 4: FIRST/FOLLOW Rule (if ε is involved)
If:
A → α | ε
Then:
FIRST(α) ∩ FOLLOW(A) = ∅
→ So parser doesn't get confused between expanding α or taking ε.
Final Checklist to Confirm LL(1):
✅ Condition | 🔍 Check for... |
---|---|
No Left Recursion | A → Aα → ❌ |
Grammar is Left Factored | A → aB | aC → ❌ |
FIRST of different rules disjoint | FIRST(α) ∩ FIRST(β) = ∅ |
FIRST(α) ∩ FOLLOW(A) if ε exists | If ε ∈ FIRST(A), then FIRST and FOLLOW disjoint |
Why all this?
LL(1) parser has only 1 lookahead token. So if grammar breaks these rules, it won't know which rule to choose → parsing conflict.
Summary:
A grammar is LL(1) only if:
- No left recursion
- Left factored
FIRST
sets don't overlapFIRST
&FOLLOW
disjoint if ε involved
If all conditions are satisfied, it's safe for fast predictive parsing without backtracking.
Proof that Example 4.9 (book) Grammar is LL(1)
We will prove step by step with all conditions and draw necessary tables.
📘 Grammar (Example 4.9):
E → T E'
E' → + T E' | ε
T → F T'
T' → * F T' | ε
F → ( E ) | id
Step 1: No Left Recursion?
We check each non-terminal:
- E → T E' (no E on RHS start)
- E' → + T E' | ε (no E' at start)
- T → F T'
- T' → * F T' | ε
- F → ( E ) | id
✅ No left recursion.
Step 2: Is the grammar Left Factored?
We check if any non-terminal has more than one production with common prefix.
Check E', T', F:
- E' → + T E' | ε → no common prefix
- T' → * F T' | ε → no common prefix
- F → ( E ) | id → no common prefix
✅ Grammar is left factored
Step 3: Compute FIRST Sets
Non-Terminal | FIRST |
---|---|
E | ( , id |
E' | + , ε |
T | ( , id |
T' | * , ε |
F | ( , id |
Why?
- F → ( E ) → FIRST = (
- F → id → FIRST = id
- ⇒ FIRST(F) = ( , id)
- ⇒ FIRST(T) = FIRST(F)
- ⇒ FIRST(E) = FIRST(T)
E' and T' have ε due to their alternate rules.
Step 4: Compute FOLLOW Sets
Use rules and propagation.
- FOLLOW(E) = { ), $ } (E appears in F → ( E ) and is start symbol)
- FOLLOW(E') = FOLLOW(E) = { ), $ }
- FOLLOW(T) = FIRST(E') excluding ε = { + } and FOLLOW(E') = { ), $ } ⇒ FOLLOW(T) = { +, ), $ }
- FOLLOW(T') = FOLLOW(T) = { +, ), $ }
- FOLLOW(F) = FIRST(T') excluding ε = { * } and FOLLOW(T') = { +, ), $ } ⇒ FOLLOW(F) = { *, +, ), $ }
Non-Terminal | FOLLOW |
---|---|
E | ) , $ |
E' | ) , $ |
T | + , ) , $ |
T' | + , ) , $ |
F | * , + , ) , $ |
Step 5: Check FIRST Set Overlap
- E → T E'
- E' → + T E' | ε → FIRST(+ T E') = { + }, FIRST(ε) = ε → ✔ disjoint
- T' → * F T' | ε → FIRST(* F T') = { * }, FIRST(ε) = ε → ✔ disjoint
✅ FIRST sets don't overlap
Step 6: Check FIRST & FOLLOW disjoint if ε involved
- E' → ε: FIRST(E') = { +, ε }, FOLLOW(E') = { ), $ } → FIRST ∩ FOLLOW = ∅ → ✔
- T' → ε: FIRST(T') = { *, ε }, FOLLOW(T') = { +, ), $ } → FIRST ∩ FOLLOW = ∅ → ✔
✅ FIRST and FOLLOW are disjoint when ε involved.
Step 7: LL(1) Parsing Table
id | + | * | ( | ) | $ | |
---|---|---|---|---|---|---|
E | E → T E' | E → T E' | ||||
E' | E' → + T E' | E' → ε | E' → ε | |||
T | T → F T' | T → F T' | ||||
T' | T' → ε | T' → * F T' | T' → ε | T' → ε | ||
F | F → id | F → ( E ) |
✅ No cell has multiple entries → no conflict
FINAL CONCLUSION:
Grammar satisfies all 4 conditions:
- No left recursion ✅
- Left factored ✅
- FIRST sets don't overlap ✅
- FIRST and FOLLOW disjoint when ε is in FIRST ✅
- No conflicts in parsing table ✅
✅ Therefore, Example 4.9 is LL(1) ✔
id + id * id
Step-by-Step LL(1) Derivation
We derive using leftmost derivation with the help of parsing table.
We write the derivation at each step.
Step 1:
Start: E
Use rule: E → T E'
Step 2:
→ T E'
Use rule: T → F T'
Step 3:
→ F T' E'
Use rule: F → id
Step 4:
→ id T' E'
Next input is +
Use rule: T' → ε
Step 5:
→ id E'
Use rule: E' → + T E'
Step 6:
→ id + T E'
Use rule: T → F T'
Step 7:
→ id + F T' E'
Use rule: F → id
Step 8:
→ id + id T' E'
Lookahead is *
Use rule: T' → * F T'
Step 9:
→ id + id * F T' E'
Use rule: F → id
Step 10:
→ id + id * id T' E'
Lookahead is $
Use: T' → ε
and E' → ε
Final Derivation:
→ T E'
→ F T' E'
→ id T' E'
→ id E'
→ id + T E'
→ id + F T' E'
→ id + id T' E'
→ id + id * F T' E'
→ id + id * id T' E'
→ id + id * id E'
→ id + id * id
LL(1) derivation successful
- Input parsed:
id + id * id
- Order of operations respected (multiplication before addition)
- Grammar respects operator precedence and associativity
✅ This proves grammar works properly in LL(1) parsing.