Deterministic CFG - CSU086 - Shoolini U

Deterministic Context-Free Grammar

1. Deterministic Context-Free Grammar (DCFG)

A Deterministic Context-Free Grammar (DCFG) is a special kind of Context-Free Grammar (CFG) whose strings can be parsed by a Deterministic Pushdown Automaton (DPDA). This means that at any step in parsing, there is at most one valid move based on the current state, stack top, and input symbol—no guessing or backtracking required.

1.1 What is DCFG?

DCFG is a subset of CFG. Just like every square is a rectangle but not every rectangle is a square, every DCFG is a CFG, but not every CFG is a DCFG.

Think of DCFG as a traffic system where only one turn is allowed at each signal. In contrast, CFG may have multiple valid roads at each signal, requiring a map or trial-and-error.

1.2 Why is DCFG Needed?

DCFGs are the foundation of efficient parsers like LL(1), LR(1), and LALR(1), which are used in real-world programming languages such as C, Java, and Python.

They provide:

Example from the real world: Imagine you're building a compiler. DCFG ensures you can write a fast, predictable parser without backtracking. This is crucial when compiling millions of lines of code.

1.3 Difference Between CFG and DCFG

Aspect CFG DCFG
Parsing Machine NPDA (Non-deterministic PDA) DPDA (Deterministic PDA)
Backtracking Allowed Not Allowed
Ambiguity Can be ambiguous Must be unambiguous
Efficiency Less efficient for large grammars Highly efficient (used in real-world parsers)
Examples Natural language, ambiguous constructs Programming languages like C, Java

1.4 Real-World Example

CFG Example: Consider a grammar that generates expressions like if-then-else.


S → if E then S else S | if E then S | a

This CFG is ambiguous. A string like if E then if E then a else a has two different parse trees. So it is not deterministic.

DCFG Example: Modify the grammar to enforce determinism:


S → M else S | a
M → if E then S

This grammar ensures there's only one parse tree by delaying the parsing of else until a full if-then is complete.

1.5 Visualizing the Automaton

stateDiagram-v2
    [*] --> q0
    q0 --> q1 : read 'if'
    q1 --> q2 : read 'then'
    q2 --> q3 : read 'else'
    q3 --> [*]

This shows a DPDA handling a deterministic grammar with no state conflict.

2. LL(1), LR(1), LALR(1) Grammars

These are types of Deterministic Context-Free Grammars (DCFGs) used in real-world compilers. The names reflect how they parse and how much lookahead they use:

2.1 LL(1) Grammar

LL(1) parsers are top-down and predictive. They require the grammar to be:

Real-world analogy: Reading a document line by line, making a decision without going back.

2.1.1 Example: LL(1) Grammar

E  → T E'
E' → + T E' | ε
T  → F T'
T' → * F T' | ε
F  → ( E ) | id

This grammar has no left recursion and no common prefixes, so it's LL(1).

2.1.2 LL(1) Parsing Table Construction

Follow(E)  = { $, ) }
First(E)   = { (, id }

Table:
+-------+--------+---------+-------+------+
| NonT  |   id   |   (     |   )   |  $   |
+-------+--------+---------+-------+------+
|   E   | E → T E'| E → T E'|       |      |
|  E'   |        |         | E'→ε  |E'→ε  |
|   T   | T → F T'|T → F T'|       |      |
|  T'   |        |         | T'→ε |T'→ε |
|   F   |F→id    |F→( E )  |       |      |
+-------+--------+---------+-------+------+

2.2 LR(1) Grammar

LR(1) parsers are bottom-up and shift-reduce based. They are powerful and can parse all DCFGs, including those with left recursion.

Real-world analogy: Reading a sentence backwards and reconstructing the grammar rules used to create it.

2.2.1 Example: LR(1) Grammar

S → L = R | R
L → * R | id
R → L

This grammar is not LL(1) due to ambiguity, but it is LR(1).

2.2.2 LR(1) Item and State

Item: [S → • L = R, $]

Closure and GOTO algorithms build a DFA from such items.

2.3 LALR(1) Grammar

LALR(1) is a practical compromise: same power as LR(1) for most programming languages but with fewer states. It's the default in most parser generators (YACC, Bison).

2.3.1 Difference: LR(1) vs LALR(1)
2.3.2 Merging States in LALR

LR(1) Item Set 1: [A → α •, a]
LR(1) Item Set 2: [A → α •, b]

LALR merges them:
[A → α •, a/b]

If merging introduces shift-reduce or reduce-reduce conflicts, grammar must be refactored.

2.4 Comparison Summary

Parser Direction Derivation Lookahead Strength Speed Used In
LL(1) Left-to-right Leftmost 1 Simple, fast Very fast Handwritten parsers, ANTLR
LR(1) Left-to-right Rightmost (reverse) 1 Powerful Moderate Bison, YACC (full mode)
LALR(1) Left-to-right Rightmost (reverse) 1 Balanced Fast Most practical compilers

2.5 Real World Scenarios

3. LL(1) Grammar and DCFG

LL(1) is a subclass of Deterministic Context-Free Grammar (DCFG). It is the simplest and most efficient form of parsing grammar, ideal for top-down parsers known as Predictive Parsers. The ‘1’ indicates one-symbol lookahead is enough to make parsing decisions without backtracking.

3.1 Definition of LL(1)

An LL(1) grammar is a type of context-free grammar that can be parsed by a top-down parser:

Think of it like reading instructions step-by-step without needing to look ahead more than one word. This makes the parser very fast and memory-efficient.

3.2 Conditions for LL(1) Grammar

To qualify as LL(1), a grammar must satisfy the following:

This ensures that the parser can always choose the correct production with only one token of lookahead.

3.3 LL(1) Parsing Table Example

Given the grammar:


E  → T E'
E' → + T E' | ε
T  → F T'
T' → * F T' | ε
F  → ( E ) | id
Step 1: Compute FIRST and FOLLOW

FIRST(E)  = FIRST(T) = FIRST(F) = { (, id }
FIRST(E') = { +, ε }
FIRST(T') = { *, ε }
FOLLOW(E)  = { ), $ }
FOLLOW(E') = FOLLOW(E) = { ), $ }
FOLLOW(T)  = FIRST(E') ∪ FOLLOW(E) = { +, ), $ }
FOLLOW(T') = FOLLOW(T) = { +, ), $ }
FOLLOW(F)  = FIRST(T') ∪ FOLLOW(T) = { *, +, ), $ }
Step 2: LL(1) Table
Non-Terminal id ( + * ) $
E E → T E' E → T E'
E' E' → + T E' E' → ε E' → ε
T T → F T' T → F T'
T' T' → ε T' → * F T' T' → ε T' → ε
F F → id F → ( E )

3.4 Predictive Parsing

Predictive Parser is a non-backtracking recursive descent parser that uses the LL(1) table to parse the input.

Steps:

Algorithm:

initialize stack with [$, StartSymbol]
repeat:
    X ← top of stack
    a ← current input token
    if X is terminal or $:
        if X == a:
            pop X and advance input
        else:
            error
    else:
        if table[X, a] = X → Y1 Y2 ... Yn:
            pop X
            push Yn ... Y1
        else:
            error
until stack is empty
Example:

Parsing: id + id * id


Stack              Input             Action
$ E                id + id * id $    Use E → T E'
$ E' T             id + id * id $    Use T → F T'
$ E' T' F          id + id * id $    Use F → id
$ E' T' id         id + id * id $    Match id
$ E' T'            + id * id $       T' → ε
$ E'               + id * id $       E' → + T E'
... (continues)

4. LR(1) Grammar and DCFG

LR(1) grammars belong to the family of Deterministic Context-Free Grammars (DCFGs) and are used in bottom-up parsing. Unlike LL(1) (top-down), LR(1) works by reconstructing the rightmost derivation in reverse. These grammars are powerful enough to parse complex programming languages like C/C++ with precision.

4.1 Bottom-Up Parsing

Bottom-Up Parsing starts from the input string and tries to build the parse tree up to the start symbol. It works using a Shift-Reduce mechanism:

Example: Given grammar:


S → a A d
A → b c

Parsing a b c d proceeds as:


Stack       Input        Action
$           a b c d $    Shift
$ a         b c d $      Shift
$ a b       c d $        Shift
$ a b c     d $          Reduce b c → A
$ a A       d $          Shift
$ a A d     $            Reduce a A d → S
$ S         $            Accept

4.2 Canonical LR(1) vs SLR(1) vs LALR(1)

4.2.1 Canonical LR(1)

Item: [A → α • β, a]

Means: Currently in the middle of rule A → αβ, we’ve parsed α, expect β next, and ‘a’ is a lookahead terminal to help resolve reduce conflicts.

4.2.2 SLR(1): Simple LR
4.2.3 LALR(1): LookAhead LR
Parser Item Type Lookahead State Count Power
Canonical LR(1) LR(1) High Highest
SLR(1) LR(0) ✗ (uses FOLLOW) Low Low
LALR(1) LR(1) merged Medium Medium (Practical)

4.3 Parsing Table Conflicts

Conflicts occur when the parser can't decide between Shift and Reduce, or between multiple Reduces.

4.3.1 Shift-Reduce Conflict

Occurs when parser can either shift or reduce at a particular state and input symbol.

Example:


if E then S else S

Ambiguity: does else belong to inner or outer if?

4.3.2 Reduce-Reduce Conflict

Occurs when multiple productions are valid reductions at same point.

Example: Grammar:


S → A | B
A → a
B → a

On input a, the parser cannot choose whether to reduce using A or B.

4.4 Real-World Relevance

Why use LR(1) or LALR(1)?

Example: In C language parsing:


// Grammar Snippet:
declaration → type id ;
type → int | float | char

Using LALR(1), we can efficiently and correctly parse these using parsing tables even with ambiguous symbols like * in declarations.

5. Applications and Limitations of DCFGs

Deterministic Context-Free Grammars (DCFGs) are the foundation of real-world compiler syntax analysis. They represent the class of languages that can be parsed deterministically using a single pass and a stack — critical for performance in programming languages. However, they have boundaries and trade-offs in grammar expressiveness.

5.1 Use in Real-World Compilers

DCFGs are central to efficient parsing in modern programming languages. Their deterministic nature allows for predictable and fast parser behavior.

Real-world Example: A C compiler uses a LALR(1) parser (a DCFG parser) to read code like int x = 10; and translate it to machine-level intermediate representation using only one token of lookahead.

5.2 Limitations of DCFGs

DCFGs are not powerful enough for every aspect of a language. Some limitations include:

Example: The famous if-then-else ambiguity cannot be resolved using simple LL(1) without modifying the grammar.

5.3 Trade-offs: Simplicity vs Power

Feature LL(1) LALR(1) Full CFG
Speed Fastest Fast Slow (may require backtracking)
Memory Use Low Moderate High
Grammar Complexity Low Moderate High
Ambiguity Handling ✓ (if allowed)
Use in Compilers Simple languages, expression evaluators Most real-world compilers Rarely used in compilers directly

Designers choose DCFGs (LL/LR variants) because:

In practice, complex features not expressible in DCFG are handled later during semantic analysis or by refactoring the grammar.