Non-Deterministic Context-Free Grammar - CSU086 - Shoolini U

Non-Deterministic Context-Free Grammar

1. Understanding Non-Deterministic CFGs

1.1 What is a Non-Deterministic Context-Free Grammar?

A Context-Free Grammar (CFG) is non-deterministic if a parser cannot uniquely decide which production rule to apply based on the current input and the grammar state.

In simple terms, the parser stands at a fork with multiple roads and no clear signpost — it doesn’t know which rule will lead to a successful parse without trying all options.

1.1.1 Real-World Analogy

Imagine a voice assistant trying to interpret "Book a table". Should it book a restaurant table or reserve a spot at a library? Without additional context (lookahead), it's non-deterministic.

1.2 Why is it Needed?

Languages like natural languages or ambiguous programming syntax often require expressive grammars that cannot always be made deterministic.

1.3 Deterministic vs Non-Deterministic CFGs

Aspect Deterministic CFG Non-Deterministic CFG
Parsing Decision Single production chosen based on input Multiple productions may apply
Parsing Speed Fast (linear) Slow (may require backtracking or lookahead)
Examples LL(1), LR(1) grammars Ambiguous grammars, natural languages
Parser Implementation Simple and efficient Complex, needs more logic or power

2. Identifying and Handling Non-Determinism

2.1 Sources of Non-Determinism

2.2 Eliminating Ambiguity (when possible)

Ambiguity cannot be completely eliminated from all CFGs. However, rewriting helps in special cases:

2.2.1 Example: Dangling Else Ambiguity
stmt → if expr then stmt
     | if expr then stmt else stmt
     | other

This grammar is ambiguous. Solution:

stmt → matched_stmt | open_stmt

matched_stmt → if expr then matched_stmt else matched_stmt | other
open_stmt → if expr then stmt | if expr then matched_stmt else open_stmt

This ensures every else pairs with the nearest unmatched then​:contentReference[oaicite:0]{index=0}.

2.3 Left Factoring

Used to delay decision-making until enough input is seen.

stmt → if expr then stmt
     | if expr then stmt else stmt

Left-factored:

stmt → if expr then stmt stmt'
stmt' → else stmt | ε

This transformation enables predictive parsing​:contentReference[oaicite:1]{index=1}.

2.4 Eliminating Left Recursion

Recursive descent parsers fail on left recursion. General transformation:

2.4.1 Rule Transformation

Given:

A → Aα | β

Transform to:

A → βA'
A' → αA' | ε
2.4.2 Algorithm

Algorithm 4.19 handles indirect and direct left recursion:


1. Order nonterminals A₁, A₂, ..., Aₙ
2. for i = 1 to n
    for j = 1 to i-1
        Replace Aᵢ → Aⱼγ with Aᵢ → δγ for all Aⱼ → δ
    Eliminate immediate left recursion on Aᵢ
2.4.3 Example

Given:


S → Aa | b
A → Ac | Sd | ε

Final grammar after removing left recursion:


S → Aa | b
A → bd A' | A'
A' → c A' | ad A' | ε

Now it's ready for top-down parsing​:contentReference[oaicite:2]{index=2}.

3. Step-by-Step Left Factoring

3.1 Purpose of Left Factoring

Left factoring is a transformation applied to grammars to make them suitable for predictive parsing. When two or more productions for a non-terminal begin with the same symbols, we delay decision-making by factoring the common prefix out.

3.2 Left Factoring Algorithm (Book Algorithm 4.21)

For a non-terminal A with productions:

A → αβ₁ | αβ₂ | ... | αβₙ | γ

Where α is the longest common prefix, the left-factored grammar becomes:

A → αA'
A' → β₁ | β₂ | ... | βₙ
A → γ (if γ doesn't start with α)

Repeat this until no two alternatives for a non-terminal share a prefix​:contentReference[oaicite:0]{index=0}.

3.3 Example 1: Dangling-Else Grammar

S → if E then S
  | if E then S else S
  | a
E → b

Step-by-step left factoring:

  1. Find common prefix: if E then S
  2. Factor it out into a new non-terminal S'
S → if E then S S' | a
S' → else S | ε
E → b

This form defers the decision until after if E then S is processed​:contentReference[oaicite:1]{index=1}.

3.4 Example 2: Abstract Grammar for Regular Expressions


rexpr → rexpr + rterm
       | rterm
rterm → rterm rfactor
       | rfactor
rfactor → rfactor *
        | rprimary
rprimary → a
         | b
Step-by-step Left Factoring for rexpr

rexpr → rterm rexpr'
rexpr' → + rterm rexpr' | ε
For rterm and rfactor

rterm → rfactor rterm'
rterm' → rfactor rterm' | ε

rfactor → rprimary rfactor'
rfactor' → * rfactor' | ε
Final Left-Factored Grammar

rexpr → rterm rexpr'
rexpr' → + rterm rexpr' | ε

rterm → rfactor rterm'
rterm' → rfactor rterm' | ε

rfactor → rprimary rfactor'
rfactor' → * rfactor' | ε

rprimary → a | b

This grammar is now suitable for top-down parsing (e.g., recursive descent)​:contentReference[oaicite:2]{index=2}.

4. Practical Applications

4.1 Top-Down Parsing and Backtracking

Top-down parsers build parse trees from the root downward using recursive function calls or parsing tables. The simplest form is the recursive-descent parser.

Example (Recursive-Descent Parser):

void A() {
  save input pointer;
  for each production A → α:
    if match(α) then return success;
    else reset input pointer;
}

Backtracking is rarely used in compilers due to inefficiency. It may require scanning input multiple times​:contentReference[oaicite:1]{index=1}.

4.2 Bottom-Up Parsing (Shift-Reduce)

Bottom-up parsing works in reverse: it builds the parse tree from leaves (tokens) up to the root (start symbol).

Shift-Reduce Parser Actions:

  1. Shift: Push the next input symbol onto the stack.
  2. Reduce: Replace a sequence of symbols on the stack with the left-hand side of a production.
  3. Accept: When the start symbol is on the stack and input is empty.
  4. Error: If no valid action exists for the stack/input configuration.

Example: Parsing id * id


STACK      INPUT       ACTION
$          id * id $   shift
$id        * id $      reduce F → id
$F         * id $      reduce T → F
$T         * id $      shift
$T *       id $        shift
$T * id    $           reduce F → id
$T * F     $           reduce T → T * F
$T         $           reduce E → T
$E         $           accept

It uses a stack and handles are detected and reduced one at a time​:contentReference[oaicite:2]{index=2}.

4.3 Tools to Deal with Non-Determinism

Real-world compilers use parser generators to handle non-deterministic CFGs efficiently.

4.3.1 Predictive Parsers (LL)
4.3.2 Shift-Reduce Parsers (LR Family)
4.3.3 Dynamic Programming Parsers

These tools manage ambiguity and non-determinism by using lookahead, parse tables, and automaton states, turning impractical grammars into efficiently parsed constructs​:contentReference[oaicite:3]{index=3}.