1. Introduction to Recursion in Grammar
Recursion in grammar allows a nonterminal to eventually refer back to itself. This recursive property enables the definition of infinitely large language constructs using finite rules. It's the backbone of how compilers understand nested or repeated structures.
1.1 What is Recursion in Grammar?
A grammar is recursive if a nonterminal derives a string that includes itself. This can happen directly or through other nonterminals. For instance:
A → A α | β
Here, A
refers to itself in the first production—this is recursion.
1.2 Why is Recursion Needed?
Recursion is essential to model constructs like:
- Arithmetic expressions:
expr → expr + term | term
- Nested statements:
stmt → if expr then stmt else stmt
- Function calls or lists:
args → arg, args | arg
Imagine trying to write a grammar that supports an infinite number of nested parentheses without recursion—you'd need infinite rules. Recursion handles this elegantly.
1.3 Direct Recursion
When a nonterminal directly calls itself on the left side of its production, it's called direct left recursion.
E → E + T | T
This says that expression E
can call itself directly—it's the first symbol on the right. This form creates infinite recursion in top-down parsers unless transformed.
1.3.1 Problem
In recursive-descent parsers, applying a rule like E → E + T
calls the E
function again, which loops infinitely without consuming input.
1.3.2 Elimination Technique
To convert left recursion to a right-recursive form:
E → T E'
E' → + T E' | ε
1.3.3 Real World Scenario
Suppose you're building a calculator. Users can write: 3 + 4 + 5
. Left recursion helps define the repeating +
operations. But to avoid infinite recursion in the parser, we rewrite it using helper rules like E'
.
1.4 Indirect Recursion
When a nonterminal indirectly refers to itself through other rules, it's indirect recursion.
S → A a | b
A → A c | S d | ε
Here, S
calls A
, which again calls S
. So S → A a → S d a
. Even though it's not direct, it's still recursive.
1.4.1 Handling Indirect Recursion
Use Algorithm 4.19 (from the Dragon Book) to eliminate indirect recursion systematically:
1. Arrange nonterminals A1, A2, ..., An
2. For i = 1 to n:
For j = 1 to i-1:
Replace Ai → Aj γ with productions using Aj's current rules
Eliminate direct recursion among Ai productions
1.4.2 Real World Scenario
Consider nested HTML tags where one tag opens another, which may again nest the original. This indirect structure mimics indirect recursion. The parser must handle such nesting correctly by eliminating indirect loops.
2. Compiler-Specific Types of Recursion
In grammar design for compilers, recursion defines how nonterminals derive patterns. It governs the structure and parse strategy. Understanding these recursion types helps in building LL parsers (top-down) or LR parsers (bottom-up).
2.1 Left Recursion (Grammar)
Definition: A grammar rule is left-recursive if a nonterminal refers to itself as the leftmost symbol in its production.
E → E + T | T
Recursive-descent parsers get stuck in infinite loops with such rules. That’s why left recursion is problematic for top-down parsers.
2.1.1 Problem
Suppose you're parsing 3 + 4 + 5
using E → E + T
. The parser keeps calling E()
on E
without consuming any tokens, leading to infinite recursion.
2.1.2 Elimination
Convert to right-recursion using a helper rule:
E → T E'
E' → + T E' | ε
2.1.3 Real-World Analogy
Imagine Russian nesting dolls. Left recursion tries to insert a doll into itself first—impossible. Right recursion adds one doll inside the previous one, which works structurally.
2.2 Right Recursion
Definition: A production is right-recursive if the nonterminal appears at the end of its rule.
A → α A | ε
Right recursion works well with top-down parsers since it consumes tokens before calling the next instance.
2.2.1 Parse Tree Shape
Right recursion builds trees leaning to the right, useful for expressions in postfix form (reverse Polish notation).
2.2.2 Real-World Analogy
Think of a stack of plates being placed one after another. Each new plate is added after placing the previous—sequential and safe, similar to right recursion.
2.2.3 Code Example
A → x A | ε
# Derives:
x x x ... x ε
2.3 Mutual Recursion (a.k.a. Indirect Recursion)
Definition: A grammar has mutual (indirect) recursion if two or more nonterminals refer to each other in a recursive cycle.
A → B α
B → A β | γ
This is equivalent to A eventually deriving A through B: A → B α → A β α
.
2.3.1 Problem
Top-down parsers face ambiguity in selecting production rules without infinite regress unless recursion is removed.
2.3.2 Elimination Algorithm (Algorithm 4.19)
1. List all nonterminals A1, A2, ..., An
2. For i = 1 to n:
For j = 1 to i-1:
Replace Ai → Aj γ with Aj’s productions
Eliminate direct recursion in Ai
This transforms mutually recursive rules into non-recursive or right-recursive ones.
2.3.3 Real-World Analogy
Imagine two friends who keep asking each other what to do next. Without breaking the cycle and making one decide, the conversation never ends—this is indirect recursion in grammar.
3. Application-Based Recursion
These recursion types are not only conceptual—they directly determine how parsers are implemented and optimized. They form the backbone of real parser behavior in compilers and interpreters.
3.1 Recursive Descent Parsing
Definition: A top-down parsing technique where each nonterminal is implemented as a recursive function in the compiler.
3.1.1 Pseudocode
void A() {
choose an A → X1X2...Xk;
for (i = 1 to k)
if (Xi is nonterminal) call Xi();
else if (Xi matches current input symbol) advance input;
else error();
}
3.1.2 Characteristics
- Uses one procedure per nonterminal
- Parses the input left to right, producing a leftmost derivation
- Backtracking may be required if the wrong rule is picked
3.1.3 Real-World Analogy
Imagine following a decision tree for diagnosing a problem. You try one path, and if it doesn’t work, you backtrack and try another—this is recursive-descent in action.
3.2 Backtracking Recursion
Definition: A form of recursion where upon failure, the parser returns to a previous state and tries an alternate production.
3.2.1 Why it's needed
- Used when the grammar is ambiguous or lacks lookahead capabilities
- Often slow and inefficient—better replaced by predictive parsing
3.2.2 Modified Pseudocode (Backtracking)
void A() {
save input pointer;
try production A → X1X2...;
if (error) restore input pointer and try next production;
}
3.2.3 Real-World Analogy
Trying a password from a list—each time you fail, you reset and try the next one. Inefficient but exhaustive.
3.2.4 Drawback
Highly inefficient for programming languages. More efficient tabular methods like Earley’s parser or CYK are used for such grammars.
3.3 Divide and Conquer Recursion
Definition: This is a recursive technique where problems are broken into sub-problems that are solved independently and combined later.
3.3.1 Role in Compilers
This technique appears in:
- Syntax tree evaluation — Evaluating subexpressions recursively (like
3 + (2 * 4)
) - Code generation — Breaking expressions/statements into manageable parts
- Optimization — Local optimizations applied recursively to subtrees
3.3.2 Real-World Analogy
Like solving a jigsaw puzzle by solving smaller sections and merging them—efficient and parallelizable.
3.3.3 Code Example
def evaluate(expr):
if is_simple(expr):
return value(expr)
left, op, right = split(expr)
return apply(op, evaluate(left), evaluate(right))
3.3.4 Advantage
Promotes modular, clean, and recursive solutions across the compiler pipeline: parsing, optimization, and code generation.
4. Conversion from Left to Right Recursion
To make grammars suitable for top-down parsing (like recursive-descent), left recursion must be eliminated. This process transforms left-recursive productions into right-recursive or iterative forms using helper nonterminals.
4.1 Methods to Eliminate Left Recursion
4.1.1 Immediate Left Recursion
Given a nonterminal A
with productions:
A → Aα₁ | Aα₂ | ... | β₁ | β₂ | ...
Where βᵢ
does not begin with A
. We rewrite it as:
A → β₁ A' | β₂ A' | ...
A' → α₁ A' | α₂ A' | ... | ε
4.1.2 General (Indirect) Left Recursion
Use Algorithm 4.19 from the book:
Input: Grammar G without cycles or ε-productions
Output: Grammar with no left recursion
1. Arrange nonterminals A₁, A₂, ..., Aₙ
2. For i = 1 to n:
a. For j = 1 to i - 1:
- Replace all productions of the form Ai → Aj γ
with Aj's current productions: γ₁, γ₂, ...
b. Eliminate immediate left recursion from Ai
4.2 Examples from the Book
4.2.1 Example 1: Expression Grammar
Original:
E → E + T | T
T → T * F | F
F → ( E ) | id
Eliminated:
E → T E'
E' → + T E' | ε
T → F T'
T' → * F T' | ε
This transforms left-recursive arithmetic expressions into a right-recursive form.
4.2.2 Example 2: Grammar (4.18)
Original:
S → A a | b
A → A c | S d | ε
Apply Algorithm 4.19:
A → A c | A a d | b d | ε
Converted:
S → A a | b
A → b d A' | A'
A' → c A' | a d A' | ε
Left recursion across productions (indirect) is eliminated step-by-step.
4.2.3 Example 3: With Semantic Actions
Original:
E → E + T { print('+') }
| E - T { print('-') }
| T
Transformed:
E → T R
R → + T { print('+') } R
| - T { print('-') } R
| ε
Actions are treated as terminals and embedded during conversion.
4.3 Real-World Analogy
Left recursion is like trying to define a word using itself as the first word in its own definition. You get stuck. Rewriting to right recursion lets you finish the sentence before referring back—much like giving instructions step by step instead of looping back too early.
5. Steps to Change a Left-Recursive Grammar to Right-Recursive Grammar
We demonstrate the process using an example from the Dragon Book (Example 4.20). The grammar includes both indirect and immediate left recursion and uses Algorithm 4.19 to eliminate them.
5.1 Original Grammar
S → A a | b
A → A c | S d | ε
Observation: S
indirectly recurses through A
: S → A a → S d a
. The grammar has indirect left recursion.
5.2 Step-by-Step Elimination Using Algorithm 4.19
Step 1: Order Nonterminals
Order them as: S, A
Step 2: For i = 1 (S)
S
has no immediate or indirect recursion. Keep S → A a | b
unchanged.
Step 3: For i = 2 (A)
A
has S
on RHS via A → S d
. Replace S
with its productions:
A → A c | A a d | b d | ε
Step 4: Separate Left-Recursive and Non-Left-Recursive Productions
Group left-recursive (starts with A
) and others:
- Left-recursive:
A → A c
,A → A a d
- Others:
A → b d
,A → ε
Step 5: Apply Standard Left Recursion Removal
Use helper nonterminal A'
:
A → b d A' | A'
A' → c A' | a d A' | ε
5.3 Final Converted Grammar
S → A a | b
A → b d A' | A'
A' → c A' | a d A' | ε
5.3.1 Parse Tree Shape
This transformation ensures that recursive-descent parsers won’t fall into infinite loops and the grammar is suitable for LL(1) parsing.
5.4 Real-World Analogy
Imagine a person (S) getting a task done either by calling another person (A), or doing it directly (b). Now if A keeps handing the task back to S, they’ll keep bouncing the task forever. We make sure the task gets done linearly without cycles by rewriting how they handle the delegation (using A').