Non-Deterministic Context-Free Grammar

1. Understanding Non-Deterministic CFGs

1.1 What is a Non-Deterministic Context-Free Grammar?

A Context-Free Grammar (CFG) is non-deterministic if a parser cannot uniquely decide which production rule to apply based on the current input and the grammar state.

In simple terms, the parser stands at a fork with multiple roads and no clear signpost — it doesn’t know which rule will lead to a successful parse without trying all options.

1.1.1 Real-World Analogy

Imagine a voice assistant trying to interpret "Book a table". Should it book a restaurant table or reserve a spot at a library? Without additional context (lookahead), it's non-deterministic.

1.2 Why is it Needed?

Languages like natural languages or ambiguous programming syntax often require expressive grammars that cannot always be made deterministic.

Compact Design: Non-deterministic grammars can describe complex syntax with fewer rules.
Real Language Support: Some constructs (like the "dangling else") naturally require non-determinism for complete expression.

1.3 Deterministic vs Non-Deterministic CFGs

Aspect	Deterministic CFG	Non-Deterministic CFG
Parsing Decision	Single production chosen based on input	Multiple productions may apply
Parsing Speed	Fast (linear)	Slow (may require backtracking or lookahead)
Examples	LL(1), LR(1) grammars	Ambiguous grammars, natural languages
Parser Implementation	Simple and efficient	Complex, needs more logic or power

2. Identifying and Handling Non-Determinism

2.1 Sources of Non-Determinism

Ambiguity: More than one parse tree for a string
Common Prefix: Multiple rules start with the same symbols (left common)
Left Recursion: Rules like A → Aα cause infinite recursion in top-down parsers

2.2 Eliminating Ambiguity (when possible)

Ambiguity cannot be completely eliminated from all CFGs. However, rewriting helps in special cases:

2.2.1 Example: Dangling Else Ambiguity

stmt → if expr then stmt
     | if expr then stmt else stmt
     | other

This grammar is ambiguous. Solution:

stmt → matched_stmt | open_stmt

matched_stmt → if expr then matched_stmt else matched_stmt | other
open_stmt → if expr then stmt | if expr then matched_stmt else open_stmt

This ensures every else pairs with the nearest unmatched then:contentReference[oaicite:0]{index=0}.

2.3 Left Factoring

Used to delay decision-making until enough input is seen.

stmt → if expr then stmt
     | if expr then stmt else stmt

Left-factored:

stmt → if expr then stmt stmt'
stmt' → else stmt | ε

This transformation enables predictive parsing:contentReference[oaicite:1]{index=1}.

2.4 Eliminating Left Recursion

Recursive descent parsers fail on left recursion. General transformation:

2.4.1 Rule Transformation

Given:

A → Aα | β

Transform to:

A → βA'
A' → αA' | ε

2.4.2 Algorithm

Algorithm 4.19 handles indirect and direct left recursion:


1. Order nonterminals A₁, A₂, ..., Aₙ
2. for i = 1 to n
    for j = 1 to i-1
        Replace Aᵢ → Aⱼγ with Aᵢ → δγ for all Aⱼ → δ
    Eliminate immediate left recursion on Aᵢ

2.4.3 Example

Given:


S → Aa | b
A → Ac | Sd | ε

Final grammar after removing left recursion:


S → Aa | b
A → bd A' | A'
A' → c A' | ad A' | ε

Now it's ready for top-down parsing:contentReference[oaicite:2]{index=2}.

3. Step-by-Step Left Factoring

3.1 Purpose of Left Factoring

Left factoring is a transformation applied to grammars to make them suitable for predictive parsing. When two or more productions for a non-terminal begin with the same symbols, we delay decision-making by factoring the common prefix out.

3.2 Left Factoring Algorithm (Book Algorithm 4.21)

For a non-terminal A with productions:

A → αβ₁ | αβ₂ | ... | αβₙ | γ

Where α is the longest common prefix, the left-factored grammar becomes:

A → αA'
A' → β₁ | β₂ | ... | βₙ
A → γ (if γ doesn't start with α)

Repeat this until no two alternatives for a non-terminal share a prefix:contentReference[oaicite:0]{index=0}.

3.3 Example 1: Dangling-Else Grammar

S → if E then S
  | if E then S else S
  | a
E → b

Step-by-step left factoring:

Find common prefix: if E then S
Factor it out into a new non-terminal S'

S → if E then S S' | a
S' → else S | ε
E → b

This form defers the decision until after if E then S is processed:contentReference[oaicite:1]{index=1}.

3.4 Example 2: Abstract Grammar for Regular Expressions


rexpr → rexpr + rterm
       | rterm
rterm → rterm rfactor
       | rfactor
rfactor → rfactor *
        | rprimary
rprimary → a
         | b

Step-by-step Left Factoring for `rexpr`

Common prefix: rexpr
Rewrite:


rexpr → rterm rexpr'
rexpr' → + rterm rexpr' | ε

For `rterm` and `rfactor`


rterm → rfactor rterm'
rterm' → rfactor rterm' | ε

rfactor → rprimary rfactor'
rfactor' → * rfactor' | ε

Final Left-Factored Grammar


rexpr → rterm rexpr'
rexpr' → + rterm rexpr' | ε

rterm → rfactor rterm'
rterm' → rfactor rterm' | ε

rfactor → rprimary rfactor'
rfactor' → * rfactor' | ε

rprimary → a | b

This grammar is now suitable for top-down parsing (e.g., recursive descent):contentReference[oaicite:2]{index=2}.

4. Practical Applications

4.1 Top-Down Parsing and Backtracking

Top-down parsers build parse trees from the root downward using recursive function calls or parsing tables. The simplest form is the recursive-descent parser.

With Backtracking: Tries all productions for a non-terminal in order. If parsing fails, it backtracks and tries the next one.
Without Backtracking (Predictive Parsing): Uses FIRST and FOLLOW sets to pick the correct production immediately. LL(1) grammars support this:contentReference[oaicite:0]{index=0}.

Example (Recursive-Descent Parser):

void A() {
  save input pointer;
  for each production A → α:
    if match(α) then return success;
    else reset input pointer;
}

Backtracking is rarely used in compilers due to inefficiency. It may require scanning input multiple times:contentReference[oaicite:1]{index=1}.

4.2 Bottom-Up Parsing (Shift-Reduce)

Bottom-up parsing works in reverse: it builds the parse tree from leaves (tokens) up to the root (start symbol).

Shift-Reduce Parser Actions:

Shift: Push the next input symbol onto the stack.
Reduce: Replace a sequence of symbols on the stack with the left-hand side of a production.
Accept: When the start symbol is on the stack and input is empty.
Error: If no valid action exists for the stack/input configuration.

Example: Parsing id * id


STACK      INPUT       ACTION
$          id * id $   shift
$id        * id $      reduce F → id
$F         * id $      reduce T → F
$T         * id $      shift
$T *       id $        shift
$T * id    $           reduce F → id
$T * F     $           reduce T → T * F
$T         $           reduce E → T
$E         $           accept

It uses a stack and handles are detected and reduced one at a time:contentReference[oaicite:2]{index=2}.

4.3 Tools to Deal with Non-Determinism

Real-world compilers use parser generators to handle non-deterministic CFGs efficiently.

4.3.1 Predictive Parsers (LL)

Tools: Hand-written or generated using ANTLR, JavaCC.
Requirements: Grammar must be LL(1); needs FIRST/FOLLOW sets and left-factoring, left-recursion removal.

4.3.2 Shift-Reduce Parsers (LR Family)

Tools: Yacc, Bison, Menhir, CUP
Variants: SLR, Canonical LR, LALR
LALR Parsers: Used in real compilers; compact like SLR, accurate like canonical LR

4.3.3 Dynamic Programming Parsers

Earley Parser: Handles all CFGs, even ambiguous ones; suitable for natural languages
CYK Parser: Based on table-filling; cubic time but powerful

These tools manage ambiguity and non-determinism by using lookahead, parse tables, and automaton states, turning impractical grammars into efficiently parsed constructs:contentReference[oaicite:3]{index=3}.