Recursion in Grammar

1. Introduction to Recursion in Grammar

Recursion in grammar allows a nonterminal to eventually refer back to itself. This recursive property enables the definition of infinitely large language constructs using finite rules. It's the backbone of how compilers understand nested or repeated structures.

1.1 What is Recursion in Grammar?

A grammar is recursive if a nonterminal derives a string that includes itself. This can happen directly or through other nonterminals. For instance:


A → A α | β

Here, A refers to itself in the first production—this is recursion.

1.2 Why is Recursion Needed?

Recursion is essential to model constructs like:

Arithmetic expressions: expr → expr + term | term
Nested statements: stmt → if expr then stmt else stmt
Function calls or lists: args → arg, args | arg

Imagine trying to write a grammar that supports an infinite number of nested parentheses without recursion—you'd need infinite rules. Recursion handles this elegantly.

1.3 Direct Recursion

When a nonterminal directly calls itself on the left side of its production, it's called direct left recursion.


E → E + T | T

This says that expression E can call itself directly—it's the first symbol on the right. This form creates infinite recursion in top-down parsers unless transformed.

1.3.1 Problem

In recursive-descent parsers, applying a rule like E → E + T calls the E function again, which loops infinitely without consuming input.

1.3.2 Elimination Technique

To convert left recursion to a right-recursive form:


E  → T E'
E' → + T E' | ε

1.3.3 Real World Scenario

Suppose you're building a calculator. Users can write: 3 + 4 + 5. Left recursion helps define the repeating + operations. But to avoid infinite recursion in the parser, we rewrite it using helper rules like E'.

1.4 Indirect Recursion

When a nonterminal indirectly refers to itself through other rules, it's indirect recursion.


S → A a | b
A → A c | S d | ε

Here, S calls A, which again calls S. So S → A a → S d a. Even though it's not direct, it's still recursive.

1.4.1 Handling Indirect Recursion

Use Algorithm 4.19 (from the Dragon Book) to eliminate indirect recursion systematically:


1. Arrange nonterminals A1, A2, ..., An
2. For i = 1 to n:
    For j = 1 to i-1:
        Replace Ai → Aj γ with productions using Aj's current rules
    Eliminate direct recursion among Ai productions

1.4.2 Real World Scenario

Consider nested HTML tags where one tag opens another, which may again nest the original. This indirect structure mimics indirect recursion. The parser must handle such nesting correctly by eliminating indirect loops.

2. Compiler-Specific Types of Recursion

In grammar design for compilers, recursion defines how nonterminals derive patterns. It governs the structure and parse strategy. Understanding these recursion types helps in building LL parsers (top-down) or LR parsers (bottom-up).

2.1 Left Recursion (Grammar)

Definition: A grammar rule is left-recursive if a nonterminal refers to itself as the leftmost symbol in its production.


E → E + T | T

Recursive-descent parsers get stuck in infinite loops with such rules. That’s why left recursion is problematic for top-down parsers.

2.1.1 Problem

Suppose you're parsing 3 + 4 + 5 using E → E + T. The parser keeps calling E() on E without consuming any tokens, leading to infinite recursion.

2.1.2 Elimination

Convert to right-recursion using a helper rule:


E  → T E'
E' → + T E' | ε

2.1.3 Real-World Analogy

Imagine Russian nesting dolls. Left recursion tries to insert a doll into itself first—impossible. Right recursion adds one doll inside the previous one, which works structurally.

2.2 Right Recursion

Definition: A production is right-recursive if the nonterminal appears at the end of its rule.


A → α A | ε

Right recursion works well with top-down parsers since it consumes tokens before calling the next instance.

2.2.1 Parse Tree Shape

Right recursion builds trees leaning to the right, useful for expressions in postfix form (reverse Polish notation).

2.2.2 Real-World Analogy

Think of a stack of plates being placed one after another. Each new plate is added after placing the previous—sequential and safe, similar to right recursion.

2.2.3 Code Example


A → x A | ε

# Derives:
x x x ... x ε

2.3 Mutual Recursion (a.k.a. Indirect Recursion)

Definition: A grammar has mutual (indirect) recursion if two or more nonterminals refer to each other in a recursive cycle.


A → B α
B → A β | γ

This is equivalent to A eventually deriving A through B: A → B α → A β α.

2.3.1 Problem

Top-down parsers face ambiguity in selecting production rules without infinite regress unless recursion is removed.

2.3.2 Elimination Algorithm (Algorithm 4.19)


1. List all nonterminals A1, A2, ..., An
2. For i = 1 to n:
   For j = 1 to i-1:
     Replace Ai → Aj γ with Aj’s productions
   Eliminate direct recursion in Ai

This transforms mutually recursive rules into non-recursive or right-recursive ones.

2.3.3 Real-World Analogy

Imagine two friends who keep asking each other what to do next. Without breaking the cycle and making one decide, the conversation never ends—this is indirect recursion in grammar.

3. Application-Based Recursion

These recursion types are not only conceptual—they directly determine how parsers are implemented and optimized. They form the backbone of real parser behavior in compilers and interpreters.

3.1 Recursive Descent Parsing

Definition: A top-down parsing technique where each nonterminal is implemented as a recursive function in the compiler.

3.1.1 Pseudocode


void A() {
    choose an A → X1X2...Xk;
    for (i = 1 to k)
        if (Xi is nonterminal) call Xi();
        else if (Xi matches current input symbol) advance input;
        else error();
}

3.1.2 Characteristics

Uses one procedure per nonterminal
Parses the input left to right, producing a leftmost derivation
Backtracking may be required if the wrong rule is picked

3.1.3 Real-World Analogy

Imagine following a decision tree for diagnosing a problem. You try one path, and if it doesn’t work, you backtrack and try another—this is recursive-descent in action.

3.2 Backtracking Recursion

Definition: A form of recursion where upon failure, the parser returns to a previous state and tries an alternate production.

3.2.1 Why it's needed

Used when the grammar is ambiguous or lacks lookahead capabilities
Often slow and inefficient—better replaced by predictive parsing

3.2.2 Modified Pseudocode (Backtracking)


void A() {
    save input pointer;
    try production A → X1X2...;
    if (error) restore input pointer and try next production;
}

3.2.3 Real-World Analogy

Trying a password from a list—each time you fail, you reset and try the next one. Inefficient but exhaustive.

3.2.4 Drawback

Highly inefficient for programming languages. More efficient tabular methods like Earley’s parser or CYK are used for such grammars.

3.3 Divide and Conquer Recursion

Definition: This is a recursive technique where problems are broken into sub-problems that are solved independently and combined later.

3.3.1 Role in Compilers

This technique appears in:

Syntax tree evaluation — Evaluating subexpressions recursively (like 3 + (2 * 4))
Code generation — Breaking expressions/statements into manageable parts
Optimization — Local optimizations applied recursively to subtrees

3.3.2 Real-World Analogy

Like solving a jigsaw puzzle by solving smaller sections and merging them—efficient and parallelizable.

3.3.3 Code Example


def evaluate(expr):
    if is_simple(expr):
        return value(expr)
    left, op, right = split(expr)
    return apply(op, evaluate(left), evaluate(right))

3.3.4 Advantage

Promotes modular, clean, and recursive solutions across the compiler pipeline: parsing, optimization, and code generation.

4. Conversion from Left to Right Recursion

To make grammars suitable for top-down parsing (like recursive-descent), left recursion must be eliminated. This process transforms left-recursive productions into right-recursive or iterative forms using helper nonterminals.

4.1 Methods to Eliminate Left Recursion

4.1.1 Immediate Left Recursion

Given a nonterminal A with productions:


A → Aα₁ | Aα₂ | ... | β₁ | β₂ | ...

Where βᵢ does not begin with A. We rewrite it as:


A  → β₁ A' | β₂ A' | ...
A' → α₁ A' | α₂ A' | ... | ε

4.1.2 General (Indirect) Left Recursion

Use Algorithm 4.19 from the book:


Input: Grammar G without cycles or ε-productions
Output: Grammar with no left recursion

1. Arrange nonterminals A₁, A₂, ..., Aₙ
2. For i = 1 to n:
   a. For j = 1 to i - 1:
      - Replace all productions of the form Ai → Aj γ
        with Aj's current productions: γ₁, γ₂, ...
   b. Eliminate immediate left recursion from Ai

4.2 Examples from the Book

4.2.1 Example 1: Expression Grammar


Original:
E → E + T | T
T → T * F | F
F → ( E ) | id

Eliminated:
E  → T E'
E' → + T E' | ε

T  → F T'
T' → * F T' | ε

This transforms left-recursive arithmetic expressions into a right-recursive form.

4.2.2 Example 2: Grammar (4.18)


Original:
S → A a | b
A → A c | S d | ε

Apply Algorithm 4.19:
A → A c | A a d | b d | ε

Converted:
S  → A a | b
A  → b d A' | A'
A' → c A' | a d A' | ε

Left recursion across productions (indirect) is eliminated step-by-step.

4.2.3 Example 3: With Semantic Actions


Original:
E → E + T { print('+') }
  | E - T { print('-') }
  | T

Transformed:
E → T R
R → + T { print('+') } R
   | - T { print('-') } R
   | ε

Actions are treated as terminals and embedded during conversion.

4.3 Real-World Analogy

Left recursion is like trying to define a word using itself as the first word in its own definition. You get stuck. Rewriting to right recursion lets you finish the sentence before referring back—much like giving instructions step by step instead of looping back too early.

5. Steps to Change a Left-Recursive Grammar to Right-Recursive Grammar

We demonstrate the process using an example from the Dragon Book (Example 4.20). The grammar includes both indirect and immediate left recursion and uses Algorithm 4.19 to eliminate them.

5.1 Original Grammar


S → A a | b
A → A c | S d | ε

Observation: S indirectly recurses through A: S → A a → S d a. The grammar has indirect left recursion.

5.2 Step-by-Step Elimination Using Algorithm 4.19

Step 1: Order Nonterminals

Order them as: S, A

Step 2: For i = 1 (S)

S has no immediate or indirect recursion. Keep S → A a | b unchanged.

Step 3: For i = 2 (A)

A has S on RHS via A → S d. Replace S with its productions:


A → A c | A a d | b d | ε

Step 4: Separate Left-Recursive and Non-Left-Recursive Productions

Group left-recursive (starts with A) and others:

Left-recursive: A → A c, A → A a d
Others: A → b d, A → ε

Step 5: Apply Standard Left Recursion Removal

Use helper nonterminal A':


A  → b d A' | A'
A' → c A' | a d A' | ε

5.3 Final Converted Grammar


S  → A a | b
A  → b d A' | A'
A' → c A' | a d A' | ε

5.3.1 Parse Tree Shape

This transformation ensures that recursive-descent parsers won’t fall into infinite loops and the grammar is suitable for LL(1) parsing.

5.4 Real-World Analogy

Imagine a person (S) getting a task done either by calling another person (A), or doing it directly (b). Now if A keeps handing the task back to S, they’ll keep bouncing the task forever. We make sure the task gets done linearly without cycles by rewriting how they handle the delegation (using A').