How to Make CFG Deterministic - CSU086 - Shoolini U

How to make CFG Deterministic

1. Understanding Deterministic CFG

A Deterministic Context-Free Grammar (DCFG) is a context-free grammar that allows parsing without any ambiguity or backtracking. It is the foundation for constructing predictive parsers (like LL(1)) and shift-reduce parsers (like LR(1)), where a single correct parse action exists for each input symbol and parser state.

1.1 What is a Deterministic CFG?

In a deterministic CFG, for any given input symbol and parsing context, the parser should be able to decide the next step uniquely. This contrasts with a general CFG, where multiple choices might exist, requiring backtracking or lookahead.

For example, consider a grammar:


S → a A | a B
A → b
B → c

This is non-deterministic because after seeing 'a', the parser can't decide whether to choose a A or a B without further lookahead.

To make it deterministic, we can left-factor it:


S → a S'
S' → b | c

Now, after reading 'a', we move to a state where the next token will determine whether to go to 'b' or 'c'. This is LL(1) deterministic.

1.2 Why is Determinism Needed?

Real-world compilers must process code in linear time with high efficiency. Determinism ensures:

1.2.1 Real-world Analogy

Imagine an ATM menu:


1. Withdraw
2. Deposit
3. Balance

You press "1", and the ATM takes you to "Withdraw". There's no ambiguity. If the ATM gave two different options after pressing "1", it would confuse the user. Similarly, a deterministic parser should always know the next step.

1.2.2 When is it a Problem?

Consider this CFG used in natural languages:


S → if E then S | if E then S else S | other

This is known as the "dangling else" problem. The ambiguity comes from not knowing which if the else belongs to. Real programming languages like C resolve this by associating else with the nearest if.

1.2.3 Real Compiler Impact

Without determinism:

2. Eliminate Ambiguity

Ambiguity occurs when a grammar generates a string that can have more than one parse tree. This is a fundamental problem in parsing because it leads to confusion in semantic interpretation.

2.1 What is Ambiguity?

If a string derived from a grammar has two or more different leftmost derivations (or parse trees), the grammar is ambiguous.

Example:


E → E + E | E * E | id

String: id + id * id

This string has two parse trees:

Both are valid derivations, but they imply different meanings. Compilers must produce exactly one interpretation — ambiguity breaks this requirement.

2.2 Why Eliminate Ambiguity?

In programming languages, ambiguity can cause:

2.3 Strategies to Eliminate Ambiguity

2.3.1 Enforce Operator Precedence and Associativity

To fix the arithmetic expression grammar, we split it by precedence:


E → E + T | T
T → T * F | F
F → id
2.3.2 Disambiguate with Language Rules

Classic case: "dangling else"


S → if E then S | if E then S else S | other

Make it unambiguous by associating else with the nearest unmatched if:


S → M | U
M → if E then M else M | other
U → if E then S
2.3.3 Rewrite the Grammar

Sometimes, restructuring rules removes ambiguity. Example:

Ambiguous:


S → a S b | ε

This allows multiple ways to generate the same string like aabb.

Unambiguous version (balanced a's and b's):


S → a S b | ε

In this case, the ambiguity was not real — the original grammar was actually unambiguous. You must prove ambiguity formally before rewriting.

2.4 Real-World Compiler Example

In C/C++, the following expression is parsed differently depending on operator precedence:


a + b * c

The compiler refers to a deterministic grammar that gives * higher precedence:

If grammar allowed both (a + b) * c and a + (b * c), it would cause ambiguity. Therefore, operator tables are translated into disambiguated CFGs.

3. Remove Left Recursion

Left recursion in a grammar occurs when a non-terminal calls itself as the first symbol in one of its productions. Left-recursive grammars are problematic for top-down parsers (like LL(1)) because they can cause infinite recursion.

3.1 What is Left Recursion?

A grammar has left recursion if there exists a non-terminal A such that:


A → A α | β

where α and β are sequences of terminals and/or non-terminals, and β does not start with A.

Example of left recursion:


E → E + T | T

Here, E is calling itself first, directly. For LL parsers, this causes an infinite loop.

3.2 Eliminate Immediate Left Recursion

The standard technique is to refactor the grammar by introducing a new non-terminal to handle recursion.

General Rule:


A → A α₁ | A α₂ | ... | β₁ | β₂ | ...

where each βᵢ does not start with A. This is transformed into:


A → β₁ A' | β₂ A' | ...
A' → α₁ A' | α₂ A' | ... | ε
3.2.1 Example: Arithmetic Expressions

Original grammar (left-recursive):


E → E + T | T
T → T * F | F
F → ( E ) | id

Step-by-step removal of left recursion:

For E:

Apply transformation:


E → T E'
E' → + T E' | ε
For T:

T → F T'
T' → * F T' | ε
F remains unchanged:

F → ( E ) | id

This grammar is now free from left recursion and suitable for LL(1) parsing.

3.3 Why Remove Left Recursion?

3.4 Real-world Compiler Scenario

In compilers like javac or g++, parser generators (like ANTLR or YACC) reject grammars with direct left recursion if configured for top-down parsing.

To write a parser that understands arithmetic expressions, you must first eliminate left recursion as shown above.

4. Apply Left Factoring

Left factoring is a grammar transformation technique used to remove common prefixes from multiple productions of a non-terminal. It is essential when a parser cannot decide which production to choose based on the next input symbol.

4.1 What is Left Factoring?

Given a non-terminal A with multiple productions starting with the same prefix:


A → α β₁ | α β₂

This makes the parser unable to choose between β₁ and β₂ just by looking at α. Left factoring rewrites this as:


A → α A'
A' → β₁ | β₂

Now the decision is delayed until enough input is seen.

4.2 Why Apply Left Factoring?

4.2.1 Real-World Analogy

Imagine a voice menu:


Say "book flight"
Say "book hotel"

The system hears "book" and gets confused. Instead, restructure:


Say "book"
Then say "flight" or "hotel"

Left factoring works the same way — delay the decision.

4.3 Detailed Example

Original Grammar:


S → if E then S else S | if E then S | other

This grammar is ambiguous due to common prefix if E then S. Parser cannot decide between the two branches without lookahead.

Left-factored version:


S → if E then S S'
S' → else S | ε
S → other

Now, after reading if E then S, the parser checks whether else follows. If yes, parse else S. If not, move on.

4.3.1 Another Example

Original Grammar:


A → int id ; | int id = num ;

Common prefix: int id

Left-factored:


A → int id A'
A' → ; | = num ;

Now the parser reads int id and decides based on the next token.

4.4 Implementation Note

Left factoring is automated in many parser generators. But for handwritten parsers, this step is manual and necessary to avoid parse conflicts.

After left factoring, the grammar becomes suitable for constructing parsing tables or recursive descent parsers.

5. Construct FIRST and FOLLOW Sets

To build an LL(1) parser or check grammar determinism, we must compute two essential sets for each non-terminal:

These sets ensure deterministic parsing decisions by guiding which production to apply based on lookahead symbols.

5.1 FIRST Set

The FIRST(X) of a symbol X is the set of terminals that begin strings derived from X. It includes:

Rules to Compute FIRST

5.2 FOLLOW Set

The FOLLOW(A) of a non-terminal A is the set of terminals that can appear immediately after A in some derivation.

Rules to Compute FOLLOW

5.3 Detailed Example

Consider this grammar:


S → A B
A → a A | ε
B → b B | c
Step 1: Compute FIRST

FIRST(A)

FIRST(B)

FIRST(S)

Step 2: Compute FOLLOW

FOLLOW(S)

FOLLOW(A)

FOLLOW(B)

Non-terminal FIRST FOLLOW
S {a, b, c} {$}
A {a, ε} {b, c}
B {b, c} {$}

5.4 How It Enables Determinism

LL(1) table uses:

Example: If input starts with a, parser chooses A → a A. If input is b or c, parser chooses A → ε and proceeds with B.

6. Check LL(1) Conditions

A grammar is said to be LL(1) if it can be parsed by an LL(1) parser — a top-down parser that uses:

For LL(1) parsing, we need to ensure that at every step, the parser can uniquely determine the correct production using only the next input symbol. This requires checking two specific conditions using the FIRST and FOLLOW sets.

6.1 LL(1) Grammar Conditions

For any non-terminal A with two or more productions:


A → α | β

The grammar is LL(1) only if:

6.2 Detailed Example

Consider the grammar:


S → A B
A → a A | ε
B → b B | c
Step 1: Use Previously Computed FIRST and FOLLOW
Non-terminal FIRST FOLLOW
S {a, b, c} {$}
A {a, ε} {b, c}
B {b, c} {$}
Step 2: Check LL(1) Conditions for A → a A | ε
Step 3: Check LL(1) Conditions for B → b B | c

✅ All conditions are satisfied. Therefore, the grammar is LL(1).

6.3 When Grammar is Not LL(1)

Example of an ambiguous grammar that fails LL(1) condition:


E → E + T | T

FIRST(E + T) = FIRST(E), which causes recursion

We remove left recursion and left factor to make it LL(1)-compliant (as done in earlier sections):


E → T E'
E' → + T E' | ε
Check LL(1) for E' → + T E' | ε

✅ LL(1) satisfied after transformation.

6.4 Practical Compiler Insight

Most parser generators like ANTLR or Bison show a warning when LL(1) conflicts exist. For real-time applications like IDE autocompletion, LL(1) parsing ensures instant and correct parsing without delay or backtracking.

7. Build the Predictive Parsing Table

The Predictive Parsing Table is a two-dimensional table used by LL(1) parsers to decide which production rule to apply, based on:

This table enables parsing with one lookahead symbol — hence "LL(1)". It is built using the FIRST and FOLLOW sets.

7.1 Construction Steps

  1. For each production A → α:
    • For each terminal a ∈ FIRST(α), add A → α to TABLE[A, a]
    • If ε ∈ FIRST(α), then for each terminal b ∈ FOLLOW(A), add A → α to TABLE[A, b]
  2. Mark each invalid entry as error or blank

This ensures that only one entry per cell exists — the defining condition of LL(1).

7.2 Example Grammar


S → A B
A → a A | ε
B → b B | c

From earlier computations:

Non-terminal FIRST FOLLOW
S {a, b, c} {$}
A {a, ε} {b, c}
B {b, c} {$}
Productions:

S → A B
A → a A | ε
B → b B | c

7.3 FIRST + FOLLOW Table Fill

Non-terminal a b c $
S S → A B S → A B S → A B error
A A → a A A → ε A → ε error
B error B → b B B → c error

✅ One production per cell → this grammar is LL(1) and parsing table is ready.

7.4 Real-world Analogy

Think of the parsing table like a GPS system:

Because only one route exists for each sign at each location, you never get lost — that’s the power of LL(1).

7.5 Use in Predictive Parser

The parser uses the table as follows:


stack ← [S, $]
input ← [tokens, $]

while stack not empty:
    X = top of stack
    a = current input symbol
    if X is terminal or $:
        if X == a: pop both
        else: error
    else:
        if TABLE[X, a] = X → α:
            replace X with α on stack
        else:
            error

8. Validate Determinism

Validation of determinism ensures the grammar and parsing table conform to the LL(1) requirements — one unique parsing decision for each input symbol at every step. This validation confirms that:

This is the final step to ensure the CFG is now deterministic and parser-ready.

8.1 What Makes a Grammar Non-Deterministic?

A grammar fails determinism if:

Example of Conflict

A → a B | a C

BOTH start with a ⇒ FIRST sets overlap ⇒ Not LL(1)

8.2 How to Validate Determinism

  1. Construct the FIRST and FOLLOW sets
  2. Build the predictive parsing table
  3. Scan each table cell:
    • Check that each [Non-terminal, Terminal] has ≤ 1 rule
    • If any cell has more than one rule, grammar is NOT LL(1)
Automated Validator Logic:

for non_terminal in grammar:
    for terminal in terminals:
        if len(table[non_terminal][terminal]) > 1:
            print("Grammar is not LL(1)")

8.3 Validated Example (From Previous Sections)

Recall:


S → A B
A → a A | ε
B → b B | c

The predictive parsing table:

Non-terminal a b c $
S S → A B S → A B S → A B error
A A → a A A → ε A → ε error
B error B → b B B → c error

✅ Each cell contains ≤ 1 rule. Grammar is deterministic.

8.4 Real-World Usage

Without validation, hidden non-determinism causes parsing bugs, miscompilation, or infinite loops.

9. Detailed Step By Step Example From Book

Consider the following grammar:


stmt → if expr then stmt
     | if expr then stmt else stmt
     | other

The grammar is ambiguous because the else can be associated with either of the two if statements. This is known as the "dangling else" problem.

We will go through the steps to make this grammar deterministic.

After following these steps, we can conclude that the grammar is now deterministic and suitable for LL(1) parsing.

In practice, this process is often automated by parser generators, but understanding the underlying principles is crucial for effective grammar design.

9.1 Eliminate Ambiguity — Step-by-Step (Dangling-Else Problem)

This classic ambiguity occurs in conditional statements where an else can be associated with more than one if. The example from the book illustrates the problem and the transformation to eliminate it.

9.1.1 Original Ambiguous Grammar


stmt → if expr then stmt
     | if expr then stmt else stmt
     | other

This grammar is ambiguous. For instance, the sentence:


if E1 then if E2 then S1 else S2

can be parsed in two ways — either associating the else with the first if or the second. But programming languages like C, Java, etc., match each else with the closest unmatched then — this preference must be encoded into an unambiguous grammar.

9.1.2 Disambiguation Strategy

Split statements into two categories:

  • Matched: Fully enclosed if-then-else (no ambiguity)
  • Open: May have unmatched then

This transformation ensures else is always associated with the nearest unmatched then.

9.1.3 Unambiguous Grammar


stmt         → matched_stmt | open_stmt

matched_stmt → if expr then matched_stmt else matched_stmt
             | other

open_stmt    → if expr then stmt
             | if expr then matched_stmt else open_stmt

Now, for:


if E1 then if E2 then S1 else S2

It parses as:


stmt
└── open_stmt
    └── if expr then matched_stmt else open_stmt
                     └── matched_stmt → if expr then matched_stmt else matched_stmt

This guarantees else always binds to the nearest then.

9.1.4 Real-World Example

In Java/C:


if (x > 0)
  if (y > 0)
    System.out.println("A");
  else
    System.out.println("B");

The else binds to the inner if. The rewritten unambiguous grammar matches this behavior exactly, which is why it is preferred in compiler frontend design.

9.1.5 Summary Table

Form Meaning Handled by
if E1 then if E2 then S1 else S2 Associates else with inner if Unambiguous Grammar
if E1 then if E2 then S1 No ambiguity Both grammars
if E1 then S1 else if E2 then S2 else S3 Right association of elses Unambiguous Grammar

9.1.6 Why This Matters

  • Ambiguity causes confusion for parser generators (LL/LR)
  • Ambiguous grammars can't be parsed without extra logic
  • Unambiguous grammar directly maps source to single parse tree

9.2 Remove Left Recursion — Step-by-Step on Disambiguated If-Else Grammar

In the unambiguous grammar constructed to eliminate the dangling-else ambiguity, we now check and eliminate left recursion to make it suitable for top-down (LL(1)) parsing.

Let’s recall the grammar from 9.1:


stmt         → matched_stmt | open_stmt
matched_stmt → if expr then matched_stmt else matched_stmt | other
open_stmt    → if expr then stmt | if expr then matched_stmt else open_stmt

None of these productions are directly left-recursive, but they do contain potential indirect left recursion. Let’s analyze step-by-step.

9.2.1 Step 1: List All Non-Terminals

  • S = stmt
  • M = matched_stmt
  • O = open_stmt

We’ll check for left recursion in each non-terminal: stmt, matched_stmt, open_stmt.

9.2.2 Step 2: Substitution for Indirect Recursion

First substitute stmt → matched_stmt | open_stmt into the productions where stmt appears, namely in open_stmt:


open_stmt → if expr then matched_stmt
          | if expr then open_stmt
          | if expr then matched_stmt else open_stmt

The rule if expr then open_stmt has the non-terminal open_stmt as the first symbol after expansion ⇒ this is direct left recursion.

9.2.3 Step 3: Eliminate Direct Left Recursion in open_stmt

We now isolate the recursive and non-recursive parts of open_stmt.

Recursive form:

open_stmt → if expr then open_stmt
Non-recursive forms:

open_stmt → if expr then matched_stmt
open_stmt → if expr then matched_stmt else open_stmt
Apply standard transformation:

open_stmt → if expr then matched_stmt open_stmt'
open_stmt' → else open_stmt open_stmt' | ε

This eliminates the left recursion while preserving the structure.

9.2.4 Final Refactored Grammar (Left Recursion Removed)


stmt         → matched_stmt | open_stmt

matched_stmt → if expr then matched_stmt else matched_stmt
             | other

open_stmt    → if expr then matched_stmt open_stmt'

open_stmt'   → else open_stmt open_stmt' | ε
Explanation:
  • open_stmt now begins with a non-recursive form only
  • open_stmt' handles the repeated chaining of else open_stmt recursively (right-recursion only)

9.2.5 Parse Tree Example (No Left Recursion)

For the string:


if E1 then matched_stmt else open_stmt else open_stmt

Derivation:


open_stmt
→ if expr then matched_stmt open_stmt'
→ if expr then matched_stmt else open_stmt open_stmt'
→ if expr then matched_stmt else open_stmt else open_stmt open_stmt'

At each stage, we apply only one rule — determinism is preserved and no left recursion occurs.

9.2.6 Why This Matters

  • Removes infinite loops in recursive-descent parsers
  • Enables FIRST and FOLLOW computation
  • Makes grammar suitable for predictive parsing (LL(1))

9.3 Apply Left Factoring — Step-by-Step on Disambiguated & Recursion-Free Grammar

Now that we have eliminated ambiguity and left recursion, the final step before constructing the parsing table is to apply left factoring. This transformation ensures that the parser can always make a deterministic choice based on a single lookahead symbol.

9.3.1 Refactored Grammar from Previous Step


stmt         → matched_stmt | open_stmt

matched_stmt → if expr then matched_stmt else matched_stmt
             | other

open_stmt    → if expr then matched_stmt open_stmt'

open_stmt'   → else open_stmt open_stmt' | ε

We will inspect each non-terminal and factor out any common prefixes.

9.3.2 Step 1: Left Factoring matched_stmt


matched_stmt → if expr then matched_stmt else matched_stmt
             | other

No common prefix between the two alternatives: if expr then ... and other. ✅ No factoring needed here.

9.3.3 Step 2: Left Factoring open_stmt


open_stmt → if expr then matched_stmt open_stmt'

Only one production exists, so ✅ nothing to factor.

9.3.4 Step 3: Left Factoring open_stmt'


open_stmt' → else open_stmt open_stmt' | ε

Here too, the productions are else ... and ε, with no overlapping prefix. ✅ No factoring needed.

9.3.5 Is Left Factoring Required at All?

No, because:

  • Each non-terminal has unique prefixes
  • No two productions of the same non-terminal begin with the same token (terminal)
  • The grammar is already in LL(1)-ready form after ambiguity removal and left recursion removal

9.3.6 Final LL(1)-Ready Grammar (No Left Factoring Needed)


stmt         → matched_stmt | open_stmt

matched_stmt → if expr then matched_stmt else matched_stmt
             | other

open_stmt    → if expr then matched_stmt open_stmt'

open_stmt'   → else open_stmt open_stmt' | ε

✅ Already suitable for predictive parsing table construction.

9.3.7 Real-World Impact

  • Parser never needs backtracking or multi-symbol lookahead
  • Code completion engines in IDEs like VSCode rely on such deterministic grammars
  • This structure reflects real language design choices (e.g., C/C++ associating else with nearest if)

9.4 Construct FIRST and FOLLOW Sets — Step-by-Step on Final Grammar

To build a predictive parser, we must compute FIRST and FOLLOW sets for each non-terminal in the grammar. These sets tell us which production to choose when parsing a symbol and help handle ε-productions properly.

9.4.1 Final Grammar


stmt         → matched_stmt | open_stmt

matched_stmt → if expr then matched_stmt else matched_stmt
             | other

open_stmt    → if expr then matched_stmt open_stmt'

open_stmt'   → else open_stmt open_stmt' | ε

Non-terminals: stmt, matched_stmt, open_stmt, open_stmt'

Terminals: if, expr, then, else, other

Note: expr is treated as a terminal (assumed lexical unit).

9.4.2 Compute FIRST Sets

FIRST(matched_stmt)

matched_stmt → if expr then matched_stmt else matched_stmt
             → other

Starts with either if or other

FIRST(matched_stmt) = {if, other}

FIRST(open_stmt)

open_stmt → if expr then matched_stmt open_stmt'

Starts with if

FIRST(open_stmt) = {if}

FIRST(open_stmt')

open_stmt' → else open_stmt open_stmt' | ε

FIRST(open_stmt') = {else, ε}

FIRST(stmt)

stmt → matched_stmt | open_stmt

FIRST(matched_stmt) = {if, other}, FIRST(open_stmt) = {if}

⇒ Combined: FIRST(stmt) = {if, other}

9.4.3 Compute FOLLOW Sets

Rules:

  • Add $ to FOLLOW(start symbol)
  • If A → αBβ, then FIRST(β) \ {ε} ⊆ FOLLOW(B)
  • If A → αB or A → αBβ and ε ∈ FIRST(β), then FOLLOW(A) ⊆ FOLLOW(B)
FOLLOW(stmt)
  • stmt is start symbol ⇒ FOLLOW(stmt) = {$}
FOLLOW(matched_stmt)

Occurs in:

  • stmt → matched_stmt → FOLLOW(stmt) = {$}
  • open_stmt → if expr then matched_stmt open_stmt'
    • ⇒ FIRST(open_stmt') = {else, ε}
  • matched_stmt → if expr then matched_stmt else matched_stmt
    • matched_stmt before and after 'else'

So we add:

  • else from FIRST(open_stmt')
  • $ from FOLLOW(stmt)

FOLLOW(matched_stmt) = {else, $}

FOLLOW(open_stmt)
  • stmt → open_stmt ⇒ FOLLOW(stmt) = {$}
  • open_stmt' → else open_stmt open_stmt'
    • ⇒ FOLLOW(open_stmt') ⊆ FOLLOW(open_stmt)
  • ⇒ include FOLLOW(open_stmt') (to be determined)

Assume temporary FOLLOW(open_stmt') = X

So far: FOLLOW(open_stmt) = {else, $}

FOLLOW(open_stmt')
  • From open_stmt → if expr then matched_stmt open_stmt'
    • ⇒ FOLLOW(open_stmt) ⊆ FOLLOW(open_stmt')
  • From open_stmt' → else open_stmt open_stmt'
    • ⇒ FOLLOW(open_stmt') ⊆ FOLLOW(open_stmt')
  • Recursive, but no new terminals

Eventually: FOLLOW(open_stmt') = FOLLOW(open_stmt) = {else, $}

9.4.4 Final FIRST and FOLLOW Table

Non-terminal FIRST FOLLOW
stmt {if, other} {$}
matched_stmt {if, other} {else, $}
open_stmt {if} {else, $}
open_stmt' {else, ε} {else, $}

9.4.5 Why This Step is Crucial

  • Used directly to build the LL(1) parsing table
  • Validates correctness and determinism of the grammar
  • Reveals any hidden ambiguity or conflict via FIRST/FOLLOW overlap

9.5 Check LL(1) Conditions — Step-by-Step

To validate that the grammar is LL(1), we must confirm that at every choice point for a non-terminal, the parser can make a unique decision using a single lookahead symbol.

This is done using FIRST and FOLLOW sets from step 9.4.

9.5.1 LL(1) Conditions to Check

For each non-terminal A with multiple productions A → α | β | ...:

  1. Condition 1: FIRST(α) ∩ FIRST(β) = ∅
  2. Condition 2: If ε ∈ FIRST(α), then FIRST(β) ∩ FOLLOW(A) = ∅

If these conditions hold for all production choices, the grammar is LL(1).

9.5.2 Grammar


stmt         → matched_stmt | open_stmt
matched_stmt → if expr then matched_stmt else matched_stmt
             | other
open_stmt    → if expr then matched_stmt open_stmt'
open_stmt'   → else open_stmt open_stmt' | ε

9.5.3 Step-by-Step LL(1) Check

Check stmt → matched_stmt | open_stmt
  • FIRST(matched_stmt) = {if, other}
  • FIRST(open_stmt) = {if}
  • ⇒ {if} ∩ {if, other} ≠ ∅ ❌
  • Conflict: Overlap on 'if' → violates Condition 1

However, this is a false conflict because matched_stmt and open_stmt differ structurally, and parser can be rewritten using a single entry for 'if'. But in strict LL(1), this overlap is not allowed unless handled via factoring or lookahead.


Check matched_stmt → if ... | other
  • FIRST(if ...) = {if}
  • FIRST(other) = {other}
  • ⇒ {if} ∩ {other} = ∅ ✅

Check open_stmt' → else open_stmt open_stmt' | ε
  • FIRST(else open_stmt open_stmt') = {else}
  • FIRST(ε) = {ε}
  • FOLLOW(open_stmt') = {else, $}
  • ⇒ {else} ∩ FOLLOW(open_stmt') = {else} ❌
  • Violates Condition 2 because ε is in FIRST and FIRST(other alt) ∩ FOLLOW ≠ ∅

This means the parser may not know whether to apply ε or else-rule upon seeing 'else' — ambiguity for predictive parser.

9.5.4 Final Verdict

Non-terminal Conflict? Reason
stmt FIRST(matched_stmt) ∩ FIRST(open_stmt) = {if}
matched_stmt FIRSTs disjoint → {if} ∩ {other} = ∅
open_stmt' FIRST(else) ∩ FOLLOW(open_stmt') = {else}

So while the grammar was made unambiguous and left-recursion free, it is not strictly LL(1) due to the FIRST/FOLLOW conflicts.

Possible Fix?
  • Merge stmt → open_stmt only and handle matched_stmt structurally inside
  • Use parser with lookahead or semantic check
  • Or redesign stmt productions using if_stmt to split concern

9.5.5 Real Compiler Insight

  • Many real compilers use LR parsers or GLR parsers to handle such patterns
  • Some use disambiguating rules like "else binds to nearest if"
  • Still, LL(1) is useful in hand-written or simple parser cases

9.6 Build the Predictive Parsing Table — Step-by-Step

To construct the LL(1) Predictive Parsing Table, we use the FIRST and FOLLOW sets calculated in step 9.4 and validate their entries using step 9.5. Each cell in the table is filled with a production rule according to these rules:

Table Construction Rules:
  1. For each production A → α, and for each terminal a ∈ FIRST(α), add A → α to TABLE[A, a]
  2. If ε ∈ FIRST(α), then for each terminal b ∈ FOLLOW(A), add A → α to TABLE[A, b]

Note: If a cell already has an entry, it indicates a conflict (non-LL(1)).

We will now fill the parsing table for our grammar step-by-step.

9.6.1 Grammar Recap


stmt         → matched_stmt | open_stmt
matched_stmt → if expr then matched_stmt else matched_stmt
             | other
open_stmt    → if expr then matched_stmt open_stmt'
open_stmt'   → else open_stmt open_stmt' | ε

Start Symbol: stmt

9.6.2 FIRST and FOLLOW (from step 9.4)

Non-terminal FIRST FOLLOW
stmt {if, other} {$}
matched_stmt {if, other} {else, $}
open_stmt {if} {else, $}
open_stmt' {else, ε} {else, $}

9.6.3 Fill the Predictive Parsing Table

Non-terminal if other else $
stmt stmt → matched_stmt
stmt → open_stmt ⚠️
stmt → matched_stmt error error
matched_stmt matched_stmt → if expr then matched_stmt else matched_stmt matched_stmt → other error error
open_stmt open_stmt → if expr then matched_stmt open_stmt' error error error
open_stmt' error error open_stmt' → else open_stmt open_stmt' open_stmt' → ε
⚠️ LL(1) Conflict at [stmt, if]

Due to overlapping FIRST sets of matched_stmt and open_stmt. This is what makes the grammar non-LL(1).

9.6.4 Final Notes

  • Except for stmt, all non-terminals are LL(1) compliant
  • Conflicting entry in [stmt, if] → more than one production applicable
  • Resolved using:
    • Grammar transformation (refactor stmt)
    • Or switch to a more powerful parser (e.g. LL(k), LR)

9.6.5 Real-World Compiler Relevance

  • Parsing table is used in recursive-descent parser to drive parsing logic
  • Conflicts indicate a need to redesign grammar or increase lookahead
  • In tools like ANTLR, this table is built behind the scenes but essential to parser generation

9.7 Simulate Parsing with Stack — Step-by-Step

We simulate how a predictive parser uses the parsing table and a stack to parse an input string. The parser maintains:

We simulate this using our grammar from step 9.6.

9.7.1 Grammar Recap (Final Form)


stmt         → matched_stmt | open_stmt
matched_stmt → if expr then matched_stmt else matched_stmt
             | other
open_stmt    → if expr then matched_stmt open_stmt'
open_stmt'   → else open_stmt open_stmt' | ε

9.7.2 Input String to Parse


if expr then other else other $

Token stream: if, expr, then, other, else, other, $

9.7.3 Stack Parsing Simulation Table

Start stack: [stmt, $]

Step Stack Input Action
1 stmt, $ if expr then other else other $ stmt → matched_stmt (from table)
2 matched_stmt, $ if expr then other else other $ matched_stmt → if expr then matched_stmt else matched_stmt
3 if, expr, then, matched_stmt, else, matched_stmt, $ if expr then other else other $ Match if
4 expr, then, matched_stmt, else, matched_stmt, $ expr then other else other $ Match expr
5 then, matched_stmt, else, matched_stmt, $ then other else other $ Match then
6 matched_stmt, else, matched_stmt, $ other else other $ matched_stmt → other
7 other, else, matched_stmt, $ other else other $ Match other
8 else, matched_stmt, $ else other $ Match else
9 matched_stmt, $ other $ matched_stmt → other
10 other, $ other $ Match other
11 $ $ Parsing successful ✅

9.7.4 Key Observations

  • Each step used a single production from the parsing table
  • Stack mirrors the expected derivation
  • No backtracking or ambiguity occurred
  • Parser terminated successfully when both input and stack reached $

9.7.5 Real-World Use

  • Recursive-descent parsers simulate this logic with real function calls instead of a stack
  • Compiler frontend uses such simulation to parse programs token-by-token
  • Errors are caught by mismatches in top-of-stack vs input lookahead