How to Make CFG Deterministic

1. Understanding Deterministic CFG

A Deterministic Context-Free Grammar (DCFG) is a context-free grammar that allows parsing without any ambiguity or backtracking. It is the foundation for constructing predictive parsers (like LL(1)) and shift-reduce parsers (like LR(1)), where a single correct parse action exists for each input symbol and parser state.

1.1 What is a Deterministic CFG?

In a deterministic CFG, for any given input symbol and parsing context, the parser should be able to decide the next step uniquely. This contrasts with a general CFG, where multiple choices might exist, requiring backtracking or lookahead.

Key Property: No ambiguity + unique parsing decision at each step
Used in: Predictive parsing (LL), Shift-reduce parsing (LR)

For example, consider a grammar:


S → a A | a B
A → b
B → c

This is non-deterministic because after seeing 'a', the parser can't decide whether to choose a A or a B without further lookahead.

To make it deterministic, we can left-factor it:


S → a S'
S' → b | c

Now, after reading 'a', we move to a state where the next token will determine whether to go to 'b' or 'c'. This is LL(1) deterministic.

1.2 Why is Determinism Needed?

Real-world compilers must process code in linear time with high efficiency. Determinism ensures:

No backtracking: Improves speed and predictability
Error detection: Easier and faster to identify invalid input
Parser generation: Tools like YACC, Bison, and ANTLR require deterministic CFGs

1.2.1 Real-world Analogy

Imagine an ATM menu:


1. Withdraw
2. Deposit
3. Balance

You press "1", and the ATM takes you to "Withdraw". There's no ambiguity. If the ATM gave two different options after pressing "1", it would confuse the user. Similarly, a deterministic parser should always know the next step.

1.2.2 When is it a Problem?

Consider this CFG used in natural languages:


S → if E then S | if E then S else S | other

This is known as the "dangling else" problem. The ambiguity comes from not knowing which if the else belongs to. Real programming languages like C resolve this by associating else with the nearest if.

1.2.3 Real Compiler Impact

Without determinism:

Parsing becomes O(n²) or worse
Compiler becomes slow and buggy
Tooling (like syntax highlighting, error messages) becomes unreliable

2. Eliminate Ambiguity

Ambiguity occurs when a grammar generates a string that can have more than one parse tree. This is a fundamental problem in parsing because it leads to confusion in semantic interpretation.

2.1 What is Ambiguity?

If a string derived from a grammar has two or more different leftmost derivations (or parse trees), the grammar is ambiguous.

Example:


E → E + E | E * E | id

String: id + id * id

This string has two parse trees:

One where + is applied after *: (id + (id * id))
One where * is applied after +: ((id + id) * id)

Both are valid derivations, but they imply different meanings. Compilers must produce exactly one interpretation — ambiguity breaks this requirement.

2.2 Why Eliminate Ambiguity?

In programming languages, ambiguity can cause:

Incorrect code generation: Different parse trees = different instructions
Unpredictable behavior: Compiler may pick arbitrary interpretation
Parsing failures: Deterministic parsers (LL/LR) cannot handle ambiguity

2.3 Strategies to Eliminate Ambiguity

2.3.1 Enforce Operator Precedence and Associativity

To fix the arithmetic expression grammar, we split it by precedence:


E → E + T | T
T → T * F | F
F → id

* has higher precedence than +
Operators are left-associative (e.g., a + b + c = (a + b) + c)

2.3.2 Disambiguate with Language Rules

Classic case: "dangling else"


S → if E then S | if E then S else S | other

Make it unambiguous by associating else with the nearest unmatched if:


S → M | U
M → if E then M else M | other
U → if E then S

2.3.3 Rewrite the Grammar

Sometimes, restructuring rules removes ambiguity. Example:

Ambiguous:


S → a S b | ε

This allows multiple ways to generate the same string like aabb.

Unambiguous version (balanced a's and b's):


S → a S b | ε

In this case, the ambiguity was not real — the original grammar was actually unambiguous. You must prove ambiguity formally before rewriting.

2.4 Real-World Compiler Example

In C/C++, the following expression is parsed differently depending on operator precedence:


a + b * c

The compiler refers to a deterministic grammar that gives * higher precedence:

b * c is evaluated first
Then a + (result)

If grammar allowed both (a + b) * c and a + (b * c), it would cause ambiguity. Therefore, operator tables are translated into disambiguated CFGs.

3. Remove Left Recursion

Left recursion in a grammar occurs when a non-terminal calls itself as the first symbol in one of its productions. Left-recursive grammars are problematic for top-down parsers (like LL(1)) because they can cause infinite recursion.

3.1 What is Left Recursion?

A grammar has left recursion if there exists a non-terminal A such that:


A → A α | β

where α and β are sequences of terminals and/or non-terminals, and β does not start with A.

Example of left recursion:


E → E + T | T

Here, E is calling itself first, directly. For LL parsers, this causes an infinite loop.

3.2 Eliminate Immediate Left Recursion

The standard technique is to refactor the grammar by introducing a new non-terminal to handle recursion.

General Rule:


A → A α₁ | A α₂ | ... | β₁ | β₂ | ...

where each βᵢ does not start with A. This is transformed into:


A → β₁ A' | β₂ A' | ...
A' → α₁ A' | α₂ A' | ... | ε

3.2.1 Example: Arithmetic Expressions

Original grammar (left-recursive):


E → E + T | T
T → T * F | F
F → ( E ) | id

Step-by-step removal of left recursion:

For E:

Left-recursive: E → E + T
Non-left-recursive: E → T

Apply transformation:


E → T E'
E' → + T E' | ε

For T:


T → F T'
T' → * F T' | ε

F remains unchanged:


F → ( E ) | id

This grammar is now free from left recursion and suitable for LL(1) parsing.

3.3 Why Remove Left Recursion?

Top-down parsers use recursive function calls to simulate derivation.
Left recursion causes these calls to recurse infinitely.
Removing it enables predictive parsing.

3.4 Real-world Compiler Scenario

In compilers like javac or g++, parser generators (like ANTLR or YACC) reject grammars with direct left recursion if configured for top-down parsing.

To write a parser that understands arithmetic expressions, you must first eliminate left recursion as shown above.

4. Apply Left Factoring

Left factoring is a grammar transformation technique used to remove common prefixes from multiple productions of a non-terminal. It is essential when a parser cannot decide which production to choose based on the next input symbol.

4.1 What is Left Factoring?

Given a non-terminal A with multiple productions starting with the same prefix:


A → α β₁ | α β₂

This makes the parser unable to choose between β₁ and β₂ just by looking at α. Left factoring rewrites this as:


A → α A'
A' → β₁ | β₂

Now the decision is delayed until enough input is seen.

4.2 Why Apply Left Factoring?

Predictive parsing requires a single production per lookahead symbol.
Reduces ambiguity when common prefixes exist.
Enables LL(1) parser construction with FIRST/FOLLOW sets.

4.2.1 Real-World Analogy

Imagine a voice menu:


Say "book flight"
Say "book hotel"

The system hears "book" and gets confused. Instead, restructure:


Say "book"
Then say "flight" or "hotel"

Left factoring works the same way — delay the decision.

4.3 Detailed Example

Original Grammar:


S → if E then S else S | if E then S | other

This grammar is ambiguous due to common prefix if E then S. Parser cannot decide between the two branches without lookahead.

Left-factored version:


S → if E then S S'
S' → else S | ε
S → other

Now, after reading if E then S, the parser checks whether else follows. If yes, parse else S. If not, move on.

4.3.1 Another Example

Original Grammar:


A → int id ; | int id = num ;

Common prefix: int id

Left-factored:


A → int id A'
A' → ; | = num ;

Now the parser reads int id and decides based on the next token.

4.4 Implementation Note

Left factoring is automated in many parser generators. But for handwritten parsers, this step is manual and necessary to avoid parse conflicts.

After left factoring, the grammar becomes suitable for constructing parsing tables or recursive descent parsers.

5. Construct FIRST and FOLLOW Sets

To build an LL(1) parser or check grammar determinism, we must compute two essential sets for each non-terminal:

FIRST: What terminals can appear first in a derivation from a non-terminal?
FOLLOW: What terminals can appear immediately after a non-terminal in any valid string?

These sets ensure deterministic parsing decisions by guiding which production to apply based on lookahead symbols.

5.1 FIRST Set

The FIRST(X) of a symbol X is the set of terminals that begin strings derived from X. It includes:

The terminal itself if X is a terminal
FIRST of the symbols in the production if X is a non-terminal
ε if X can derive ε

Rules to Compute FIRST

If X is a terminal, then FIRST(X) = {X}
If X → ε is a production, then ε ∈ FIRST(X)
If X → Y₁ Y₂ ... Yₙ, then:
- add FIRST(Y₁) excluding ε
- if FIRST(Y₁) contains ε, add FIRST(Y₂), and so on
- if all Yᵢ derive ε, add ε to FIRST(X)

5.2 FOLLOW Set

The FOLLOW(A) of a non-terminal A is the set of terminals that can appear immediately after A in some derivation.

Rules to Compute FOLLOW

If S is the start symbol, add $ to FOLLOW(S)
If A → α B β, then add FIRST(β) \ {ε} to FOLLOW(B)
If A → α B or A → α B β and FIRST(β) contains ε, add FOLLOW(A) to FOLLOW(B)

5.3 Detailed Example

Consider this grammar:


S → A B
A → a A | ε
B → b B | c

Step 1: Compute FIRST

FIRST(A)

A → a A → starts with a
A → ε → includes ε
⇒ FIRST(A) = {a, ε}

FIRST(B)

B → b B → starts with b
B → c → starts with c
⇒ FIRST(B) = {b, c}

FIRST(S)

S → A B
FIRST(A) = {a, ε}
If A ⇒ ε, then consider FIRST(B) too
⇒ FIRST(S) = FIRST(A) ∪ FIRST(B) = {a, b, c}

Step 2: Compute FOLLOW

FOLLOW(S)

S is start symbol ⇒ $ ∈ FOLLOW(S)

FOLLOW(A)

S → A B ⇒ FIRST(B) = {b, c} ⊆ FOLLOW(A)
⇒ FOLLOW(A) = {b, c}

FOLLOW(B)

B is at the end of S → A B ⇒ FOLLOW(S) = {$} ⊆ FOLLOW(B)
⇒ FOLLOW(B) = {$}

Non-terminal	FIRST	FOLLOW
S	{a, b, c}	{$}
A	{a, ε}	{b, c}
B	{b, c}	{$}

5.4 How It Enables Determinism

LL(1) table uses:

FIRST to choose rule on lookahead
FOLLOW when ε appears in FIRST, to decide when to apply the ε-production

Example: If input starts with a, parser chooses A → a A. If input is b or c, parser chooses A → ε and proceeds with B.

6. Check LL(1) Conditions

A grammar is said to be LL(1) if it can be parsed by an LL(1) parser — a top-down parser that uses:

Left-to-right scanning of input (first L)
Leftmost derivation (second L)
1-symbol lookahead (1)

For LL(1) parsing, we need to ensure that at every step, the parser can uniquely determine the correct production using only the next input symbol. This requires checking two specific conditions using the FIRST and FOLLOW sets.

6.1 LL(1) Grammar Conditions

For any non-terminal A with two or more productions:


A → α | β

The grammar is LL(1) only if:

Condition 1: FIRST(α) ∩ FIRST(β) = ∅
Condition 2: If ε ∈ FIRST(α), then FIRST(β) ∩ FOLLOW(A) = ∅
Similarly, if ε ∈ FIRST(β), then FIRST(α) ∩ FOLLOW(A) = ∅

6.2 Detailed Example

Consider the grammar:


S → A B
A → a A | ε
B → b B | c

Step 1: Use Previously Computed FIRST and FOLLOW

Non-terminal	FIRST	FOLLOW
S	{a, b, c}	{$}
A	{a, ε}	{b, c}
B	{b, c}	{$}

Step 2: Check LL(1) Conditions for A → a A | ε

FIRST(a A) = {a}
FIRST(ε) = {ε}
⇒ FIRST(a A) ∩ FIRST(ε) = ∅ ✅
Since ε ∈ FIRST(A), check FIRST(a A) ∩ FOLLOW(A) = {a} ∩ {b, c} = ∅ ✅

Step 3: Check LL(1) Conditions for B → b B | c

FIRST(b B) = {b}
FIRST(c) = {c}
⇒ FIRST(b B) ∩ FIRST(c) = ∅ ✅

✅ All conditions are satisfied. Therefore, the grammar is LL(1).

6.3 When Grammar is Not LL(1)

Example of an ambiguous grammar that fails LL(1) condition:


E → E + T | T

FIRST(E + T) = FIRST(E), which causes recursion

We remove left recursion and left factor to make it LL(1)-compliant (as done in earlier sections):


E → T E'
E' → + T E' | ε

Check LL(1) for E' → + T E' | ε

FIRST(+ T E') = {+}
FIRST(ε) = {ε}
FOLLOW(E') = {$, )}
⇒ {+} ∩ FOLLOW(E') = ∅ ✅

✅ LL(1) satisfied after transformation.

6.4 Practical Compiler Insight

Most parser generators like ANTLR or Bison show a warning when LL(1) conflicts exist. For real-time applications like IDE autocompletion, LL(1) parsing ensures instant and correct parsing without delay or backtracking.

7. Build the Predictive Parsing Table

The Predictive Parsing Table is a two-dimensional table used by LL(1) parsers to decide which production rule to apply, based on:

Current non-terminal (row)
Next input symbol (column)

This table enables parsing with one lookahead symbol — hence "LL(1)". It is built using the FIRST and FOLLOW sets.

7.1 Construction Steps

For each production A → α:
- For each terminal a ∈ FIRST(α), add A → α to TABLE[A, a]
- If ε ∈ FIRST(α), then for each terminal b ∈ FOLLOW(A), add A → α to TABLE[A, b]
Mark each invalid entry as error or blank

This ensures that only one entry per cell exists — the defining condition of LL(1).

7.2 Example Grammar


S → A B
A → a A | ε
B → b B | c

From earlier computations:

Non-terminal	FIRST	FOLLOW
S	{a, b, c}	{$}
A	{a, ε}	{b, c}
B	{b, c}	{$}

Productions:


S → A B
A → a A | ε
B → b B | c

7.3 FIRST + FOLLOW Table Fill

Non-terminal	a	b	c	$
S	S → A B	S → A B	S → A B	error
A	A → a A	A → ε	A → ε	error
B	error	B → b B	B → c	error

✅ One production per cell → this grammar is LL(1) and parsing table is ready.

7.4 Real-world Analogy

Think of the parsing table like a GPS system:

You are at a "non-terminal" (current location)
You see a "lookahead symbol" (road sign)
GPS (table) tells you the exact route (production) to follow

Because only one route exists for each sign at each location, you never get lost — that’s the power of LL(1).

7.5 Use in Predictive Parser

The parser uses the table as follows:


stack ← [S, $]
input ← [tokens, $]

while stack not empty:
    X = top of stack
    a = current input symbol
    if X is terminal or $:
        if X == a: pop both
        else: error
    else:
        if TABLE[X, a] = X → α:
            replace X with α on stack
        else:
            error

8. Validate Determinism

Validation of determinism ensures the grammar and parsing table conform to the LL(1) requirements — one unique parsing decision for each input symbol at every step. This validation confirms that:

Each cell in the parsing table contains at most one production rule
No conflicts exist (no multiple entries or ambiguity)
FIRST and FOLLOW sets do not overlap improperly

This is the final step to ensure the CFG is now deterministic and parser-ready.

8.1 What Makes a Grammar Non-Deterministic?

A grammar fails determinism if:

FIRST(α) ∩ FIRST(β) ≠ ∅ for two productions A → α | β
ε ∈ FIRST(α) and FIRST(β) ∩ FOLLOW(A) ≠ ∅
Parsing table has multiple entries for any [Non-terminal, terminal] pair

Example of Conflict


A → a B | a C

BOTH start with a ⇒ FIRST sets overlap ⇒ Not LL(1)

8.2 How to Validate Determinism

Construct the FIRST and FOLLOW sets
Build the predictive parsing table
Scan each table cell:
- Check that each [Non-terminal, Terminal] has ≤ 1 rule
- If any cell has more than one rule, grammar is NOT LL(1)

Automated Validator Logic:


for non_terminal in grammar:
    for terminal in terminals:
        if len(table[non_terminal][terminal]) > 1:
            print("Grammar is not LL(1)")

8.3 Validated Example (From Previous Sections)

Recall:


S → A B
A → a A | ε
B → b B | c

The predictive parsing table:

Non-terminal	a	b	c	$
S	S → A B	S → A B	S → A B	error
A	A → a A	A → ε	A → ε	error
B	error	B → b B	B → c	error

✅ Each cell contains ≤ 1 rule. Grammar is deterministic.

8.4 Real-World Usage

Parser generators like ANTLR or Bison will flag LL(1) violations
IDE autocompletion, code linters, and syntax highlighters depend on deterministic grammars to work without lag

Without validation, hidden non-determinism causes parsing bugs, miscompilation, or infinite loops.

9. Detailed Step By Step Example From Book

Consider the following grammar:


stmt → if expr then stmt
     | if expr then stmt else stmt
     | other

The grammar is ambiguous because the else can be associated with either of the two if statements. This is known as the "dangling else" problem.

We will go through the steps to make this grammar deterministic.

Step 1: Identify Ambiguity
Step 2: Rewrite the Grammar
Step 3: Remove Left Recursion
Step 4: Apply Left Factoring
Step 5: Construct FIRST and FOLLOW Sets
Step 6: Check LL(1) Conditions
Step 7: Build the Predictive Parsing Table
Step 8: Validate Determinism

After following these steps, we can conclude that the grammar is now deterministic and suitable for LL(1) parsing.

In practice, this process is often automated by parser generators, but understanding the underlying principles is crucial for effective grammar design.

9.1 Eliminate Ambiguity — Step-by-Step (Dangling-Else Problem)

This classic ambiguity occurs in conditional statements where an else can be associated with more than one if. The example from the book illustrates the problem and the transformation to eliminate it.

9.1.1 Original Ambiguous Grammar


stmt → if expr then stmt
     | if expr then stmt else stmt
     | other

This grammar is ambiguous. For instance, the sentence:


if E1 then if E2 then S1 else S2

can be parsed in two ways — either associating the else with the first if or the second. But programming languages like C, Java, etc., match each else with the closest unmatched then — this preference must be encoded into an unambiguous grammar.

9.1.2 Disambiguation Strategy

Split statements into two categories:

Matched: Fully enclosed if-then-else (no ambiguity)
Open: May have unmatched then

This transformation ensures else is always associated with the nearest unmatched then.

9.1.3 Unambiguous Grammar


stmt         → matched_stmt | open_stmt

matched_stmt → if expr then matched_stmt else matched_stmt
             | other

open_stmt    → if expr then stmt
             | if expr then matched_stmt else open_stmt

Now, for:


if E1 then if E2 then S1 else S2

It parses as:


stmt
└── open_stmt
    └── if expr then matched_stmt else open_stmt
                     └── matched_stmt → if expr then matched_stmt else matched_stmt

This guarantees else always binds to the nearest then.

9.1.4 Real-World Example

In Java/C:


if (x > 0)
  if (y > 0)
    System.out.println("A");
  else
    System.out.println("B");

The else binds to the inner if. The rewritten unambiguous grammar matches this behavior exactly, which is why it is preferred in compiler frontend design.

9.1.5 Summary Table

Form	Meaning	Handled by
`if E1 then if E2 then S1 else S2`	Associates else with inner if	Unambiguous Grammar
`if E1 then if E2 then S1`	No ambiguity	Both grammars
`if E1 then S1 else if E2 then S2 else S3`	Right association of elses	Unambiguous Grammar

9.1.6 Why This Matters

Ambiguity causes confusion for parser generators (LL/LR)
Ambiguous grammars can't be parsed without extra logic
Unambiguous grammar directly maps source to single parse tree

9.2 Remove Left Recursion — Step-by-Step on Disambiguated If-Else Grammar

In the unambiguous grammar constructed to eliminate the dangling-else ambiguity, we now check and eliminate left recursion to make it suitable for top-down (LL(1)) parsing.

Let’s recall the grammar from 9.1:


stmt         → matched_stmt | open_stmt
matched_stmt → if expr then matched_stmt else matched_stmt | other
open_stmt    → if expr then stmt | if expr then matched_stmt else open_stmt

None of these productions are directly left-recursive, but they do contain potential indirect left recursion. Let’s analyze step-by-step.

9.2.1 Step 1: List All Non-Terminals

S = stmt
M = matched_stmt
O = open_stmt

We’ll check for left recursion in each non-terminal: stmt, matched_stmt, open_stmt.

9.2.2 Step 2: Substitution for Indirect Recursion

First substitute stmt → matched_stmt | open_stmt into the productions where stmt appears, namely in open_stmt:


open_stmt → if expr then matched_stmt
          | if expr then open_stmt
          | if expr then matched_stmt else open_stmt

The rule if expr then open_stmt has the non-terminal open_stmt as the first symbol after expansion ⇒ this is direct left recursion.

9.2.3 Step 3: Eliminate Direct Left Recursion in open_stmt

We now isolate the recursive and non-recursive parts of open_stmt.

Recursive form:


open_stmt → if expr then open_stmt

Non-recursive forms:


open_stmt → if expr then matched_stmt
open_stmt → if expr then matched_stmt else open_stmt

Apply standard transformation:


open_stmt → if expr then matched_stmt open_stmt'
open_stmt' → else open_stmt open_stmt' | ε

This eliminates the left recursion while preserving the structure.

9.2.4 Final Refactored Grammar (Left Recursion Removed)


stmt         → matched_stmt | open_stmt

matched_stmt → if expr then matched_stmt else matched_stmt
             | other

open_stmt    → if expr then matched_stmt open_stmt'

open_stmt'   → else open_stmt open_stmt' | ε

Explanation:

open_stmt now begins with a non-recursive form only
open_stmt' handles the repeated chaining of else open_stmt recursively (right-recursion only)

9.2.5 Parse Tree Example (No Left Recursion)

For the string:


if E1 then matched_stmt else open_stmt else open_stmt

Derivation:


open_stmt
→ if expr then matched_stmt open_stmt'
→ if expr then matched_stmt else open_stmt open_stmt'
→ if expr then matched_stmt else open_stmt else open_stmt open_stmt'

At each stage, we apply only one rule — determinism is preserved and no left recursion occurs.

9.2.6 Why This Matters

Removes infinite loops in recursive-descent parsers
Enables FIRST and FOLLOW computation
Makes grammar suitable for predictive parsing (LL(1))

9.3 Apply Left Factoring — Step-by-Step on Disambiguated & Recursion-Free Grammar

Now that we have eliminated ambiguity and left recursion, the final step before constructing the parsing table is to apply left factoring. This transformation ensures that the parser can always make a deterministic choice based on a single lookahead symbol.

9.3.1 Refactored Grammar from Previous Step


stmt         → matched_stmt | open_stmt

matched_stmt → if expr then matched_stmt else matched_stmt
             | other

open_stmt    → if expr then matched_stmt open_stmt'

open_stmt'   → else open_stmt open_stmt' | ε

We will inspect each non-terminal and factor out any common prefixes.

9.3.2 Step 1: Left Factoring `matched_stmt`


matched_stmt → if expr then matched_stmt else matched_stmt
             | other

No common prefix between the two alternatives: if expr then ... and other. ✅ No factoring needed here.

9.3.3 Step 2: Left Factoring `open_stmt`


open_stmt → if expr then matched_stmt open_stmt'

Only one production exists, so ✅ nothing to factor.

9.3.4 Step 3: Left Factoring `open_stmt'`


open_stmt' → else open_stmt open_stmt' | ε

Here too, the productions are else ... and ε, with no overlapping prefix. ✅ No factoring needed.

9.3.5 Is Left Factoring Required at All?

No, because:

Each non-terminal has unique prefixes
No two productions of the same non-terminal begin with the same token (terminal)
The grammar is already in LL(1)-ready form after ambiguity removal and left recursion removal

9.3.6 Final LL(1)-Ready Grammar (No Left Factoring Needed)


stmt         → matched_stmt | open_stmt

matched_stmt → if expr then matched_stmt else matched_stmt
             | other

open_stmt    → if expr then matched_stmt open_stmt'

open_stmt'   → else open_stmt open_stmt' | ε

✅ Already suitable for predictive parsing table construction.

9.3.7 Real-World Impact

Parser never needs backtracking or multi-symbol lookahead
Code completion engines in IDEs like VSCode rely on such deterministic grammars
This structure reflects real language design choices (e.g., C/C++ associating else with nearest if)

9.4 Construct FIRST and FOLLOW Sets — Step-by-Step on Final Grammar

To build a predictive parser, we must compute FIRST and FOLLOW sets for each non-terminal in the grammar. These sets tell us which production to choose when parsing a symbol and help handle ε-productions properly.

9.4.1 Final Grammar


stmt         → matched_stmt | open_stmt

matched_stmt → if expr then matched_stmt else matched_stmt
             | other

open_stmt    → if expr then matched_stmt open_stmt'

open_stmt'   → else open_stmt open_stmt' | ε

Non-terminals: stmt, matched_stmt, open_stmt, open_stmt'

Terminals: if, expr, then, else, other

Note: expr is treated as a terminal (assumed lexical unit).

9.4.2 Compute FIRST Sets

FIRST(matched_stmt)


matched_stmt → if expr then matched_stmt else matched_stmt
             → other

Starts with either if or other

FIRST(matched_stmt) = {if, other}

FIRST(open_stmt)


open_stmt → if expr then matched_stmt open_stmt'

Starts with if

FIRST(open_stmt) = {if}

FIRST(open_stmt')


open_stmt' → else open_stmt open_stmt' | ε

FIRST(open_stmt') = {else, ε}

FIRST(stmt)


stmt → matched_stmt | open_stmt

FIRST(matched_stmt) = {if, other}, FIRST(open_stmt) = {if}

⇒ Combined: FIRST(stmt) = {if, other}

9.4.3 Compute FOLLOW Sets

Rules:

Add $ to FOLLOW(start symbol)
If A → αBβ, then FIRST(β) \ {ε} ⊆ FOLLOW(B)
If A → αB or A → αBβ and ε ∈ FIRST(β), then FOLLOW(A) ⊆ FOLLOW(B)

FOLLOW(stmt)

stmt is start symbol ⇒ FOLLOW(stmt) = {$}

FOLLOW(matched_stmt)

Occurs in:

stmt → matched_stmt → FOLLOW(stmt) = {$}
open_stmt → if expr then matched_stmt open_stmt'

⇒ FIRST(open_stmt') = {else, ε}

matched_stmt → if expr then matched_stmt else matched_stmt
- matched_stmt before and after 'else'

So we add:

else from FIRST(open_stmt')
$ from FOLLOW(stmt)

FOLLOW(matched_stmt) = {else, $}

FOLLOW(open_stmt)

stmt → open_stmt ⇒ FOLLOW(stmt) = {$}
open_stmt' → else open_stmt open_stmt'

⇒ FOLLOW(open_stmt') ⊆ FOLLOW(open_stmt)

⇒ include FOLLOW(open_stmt') (to be determined)

Assume temporary FOLLOW(open_stmt') = X

So far: FOLLOW(open_stmt) = {else, $}

FOLLOW(open_stmt')

From open_stmt → if expr then matched_stmt open_stmt'

⇒ FOLLOW(open_stmt) ⊆ FOLLOW(open_stmt')

From open_stmt' → else open_stmt open_stmt'

⇒ FOLLOW(open_stmt') ⊆ FOLLOW(open_stmt')

Recursive, but no new terminals

Eventually: FOLLOW(open_stmt') = FOLLOW(open_stmt) = {else, $}

9.4.4 Final FIRST and FOLLOW Table

Non-terminal	FIRST	FOLLOW
stmt	{if, other}	{$}
matched_stmt	{if, other}	{else, $}
open_stmt	{if}	{else, $}
open_stmt'	{else, ε}	{else, $}

9.4.5 Why This Step is Crucial

Used directly to build the LL(1) parsing table
Validates correctness and determinism of the grammar
Reveals any hidden ambiguity or conflict via FIRST/FOLLOW overlap

9.5 Check LL(1) Conditions — Step-by-Step

To validate that the grammar is LL(1), we must confirm that at every choice point for a non-terminal, the parser can make a unique decision using a single lookahead symbol.

This is done using FIRST and FOLLOW sets from step 9.4.

9.5.1 LL(1) Conditions to Check

For each non-terminal A with multiple productions A → α | β | ...:

Condition 1: FIRST(α) ∩ FIRST(β) = ∅
Condition 2: If ε ∈ FIRST(α), then FIRST(β) ∩ FOLLOW(A) = ∅

If these conditions hold for all production choices, the grammar is LL(1).

9.5.2 Grammar


stmt         → matched_stmt | open_stmt
matched_stmt → if expr then matched_stmt else matched_stmt
             | other
open_stmt    → if expr then matched_stmt open_stmt'
open_stmt'   → else open_stmt open_stmt' | ε

9.5.3 Step-by-Step LL(1) Check

Check stmt → matched_stmt | open_stmt

FIRST(matched_stmt) = {if, other}
FIRST(open_stmt) = {if}
⇒ {if} ∩ {if, other} ≠ ∅ ❌
Conflict: Overlap on 'if' → violates Condition 1

However, this is a false conflict because matched_stmt and open_stmt differ structurally, and parser can be rewritten using a single entry for 'if'. But in strict LL(1), this overlap is not allowed unless handled via factoring or lookahead.

Check matched_stmt → if ... | other

FIRST(if ...) = {if}
FIRST(other) = {other}
⇒ {if} ∩ {other} = ∅ ✅

Check open_stmt' → else open_stmt open_stmt' | ε

FIRST(else open_stmt open_stmt') = {else}
FIRST(ε) = {ε}
FOLLOW(open_stmt') = {else, $}
⇒ {else} ∩ FOLLOW(open_stmt') = {else} ❌
Violates Condition 2 because ε is in FIRST and FIRST(other alt) ∩ FOLLOW ≠ ∅

This means the parser may not know whether to apply ε or else-rule upon seeing 'else' — ambiguity for predictive parser.

9.5.4 Final Verdict

Non-terminal	Conflict?	Reason
stmt	❌	FIRST(matched_stmt) ∩ FIRST(open_stmt) = {if}
matched_stmt	✅	FIRSTs disjoint → {if} ∩ {other} = ∅
open_stmt'	❌	FIRST(else) ∩ FOLLOW(open_stmt') = {else}

So while the grammar was made unambiguous and left-recursion free, it is not strictly LL(1) due to the FIRST/FOLLOW conflicts.

Possible Fix?

Merge stmt → open_stmt only and handle matched_stmt structurally inside
Use parser with lookahead or semantic check
Or redesign stmt productions using if_stmt to split concern

9.5.5 Real Compiler Insight

Many real compilers use LR parsers or GLR parsers to handle such patterns
Some use disambiguating rules like "else binds to nearest if"
Still, LL(1) is useful in hand-written or simple parser cases

9.6 Build the Predictive Parsing Table — Step-by-Step

To construct the LL(1) Predictive Parsing Table, we use the FIRST and FOLLOW sets calculated in step 9.4 and validate their entries using step 9.5. Each cell in the table is filled with a production rule according to these rules:

Table Construction Rules:

For each production A → α, and for each terminal a ∈ FIRST(α), add A → α to TABLE[A, a]
If ε ∈ FIRST(α), then for each terminal b ∈ FOLLOW(A), add A → α to TABLE[A, b]

Note: If a cell already has an entry, it indicates a conflict (non-LL(1)).

We will now fill the parsing table for our grammar step-by-step.

9.6.1 Grammar Recap


stmt         → matched_stmt | open_stmt
matched_stmt → if expr then matched_stmt else matched_stmt
             | other
open_stmt    → if expr then matched_stmt open_stmt'
open_stmt'   → else open_stmt open_stmt' | ε

Start Symbol: stmt

9.6.2 FIRST and FOLLOW (from step 9.4)

Non-terminal	FIRST	FOLLOW
stmt	{if, other}	{$}
matched_stmt	{if, other}	{else, $}
open_stmt	{if}	{else, $}
open_stmt'	{else, ε}	{else, $}

9.6.3 Fill the Predictive Parsing Table

Non-terminal	if	other	else	$
stmt	stmt → matched_stmt stmt → open_stmt ⚠️	stmt → matched_stmt	error	error
matched_stmt	matched_stmt → if expr then matched_stmt else matched_stmt	matched_stmt → other	error	error
open_stmt	open_stmt → if expr then matched_stmt open_stmt'	error	error	error
open_stmt'	error	error	open_stmt' → else open_stmt open_stmt'	open_stmt' → ε

⚠️ LL(1) Conflict at [stmt, if]

Due to overlapping FIRST sets of matched_stmt and open_stmt. This is what makes the grammar non-LL(1).

9.6.4 Final Notes

Except for stmt, all non-terminals are LL(1) compliant
Conflicting entry in [stmt, if] → more than one production applicable
Resolved using:
- Grammar transformation (refactor stmt)
- Or switch to a more powerful parser (e.g. LL(k), LR)

9.6.5 Real-World Compiler Relevance

Parsing table is used in recursive-descent parser to drive parsing logic
Conflicts indicate a need to redesign grammar or increase lookahead
In tools like ANTLR, this table is built behind the scenes but essential to parser generation

9.7 Simulate Parsing with Stack — Step-by-Step

We simulate how a predictive parser uses the parsing table and a stack to parse an input string. The parser maintains:

A stack initialized with the start symbol and $ (end-marker)
An input buffer containing tokens followed by $
A parsing table to decide which production to apply based on [top of stack, lookahead]

We simulate this using our grammar from step 9.6.

9.7.1 Grammar Recap (Final Form)


stmt         → matched_stmt | open_stmt
matched_stmt → if expr then matched_stmt else matched_stmt
             | other
open_stmt    → if expr then matched_stmt open_stmt'
open_stmt'   → else open_stmt open_stmt' | ε

9.7.2 Input String to Parse


if expr then other else other $

Token stream: if, expr, then, other, else, other, $

9.7.3 Stack Parsing Simulation Table

Start stack: [stmt, $]

Step	Stack	Input	Action
1	stmt, $	if expr then other else other $	stmt → matched_stmt (from table)
2	matched_stmt, $	if expr then other else other $	matched_stmt → if expr then matched_stmt else matched_stmt
3	if, expr, then, matched_stmt, else, matched_stmt, $	if expr then other else other $	Match if
4	expr, then, matched_stmt, else, matched_stmt, $	expr then other else other $	Match expr
5	then, matched_stmt, else, matched_stmt, $	then other else other $	Match then
6	matched_stmt, else, matched_stmt, $	other else other $	matched_stmt → other
7	other, else, matched_stmt, $	other else other $	Match other
8	else, matched_stmt, $	else other $	Match else
9	matched_stmt, $	other $	matched_stmt → other
10	other, $	other $	Match other
11	$	$	Parsing successful ✅

9.7.4 Key Observations

Each step used a single production from the parsing table
Stack mirrors the expected derivation
No backtracking or ambiguity occurred
Parser terminated successfully when both input and stack reached $

9.7.5 Real-World Use

Recursive-descent parsers simulate this logic with real function calls instead of a stack
Compiler frontend uses such simulation to parse programs token-by-token
Errors are caught by mismatches in top-of-stack vs input lookahead