Parsers - Compiler Design - Shoolini University

Parsers

1. Introduction to Parsers

A parser is the second phase of a compiler that receives tokens from the lexical analyzer and builds a structured representation of the source code, typically a parse tree.

1.1 What is a Parser?

A parser is a system component that checks whether the sequence of tokens from the lexical analyzer form a syntactically valid program according to the rules of a programming language defined by a grammar (typically a context-free grammar).

Think of the parser as a grammar-checking assistant for a programmer — just like Microsoft Word underlines syntactic mistakes in sentences, a parser flags invalid code constructs.

There are two main types of parsers:

graph TD
  A[Parsers] --> B[Top-Down Parsers]
  A --> C[Bottom-Up Parsers]

  %% Top-Down Parsers Breakdown
  B --> B1[Recursive Descent Parser]
  B1 --> B1a[Backtracking Parser]
  B1 --> B1b[Non-Backtracking Parser]
  B --> B2[Predictive Parser]
  B2 --> B2a["LL(1) Parser"]
  B2 --> B2b["LL(k) Parser"]
  B2 --> B2c[Table-Driven Predictive Parser]

  %% Bottom-Up Parsers Breakdown
  C --> C1[Shift-Reduce Parser]
  C1 --> C2[LR Parsers]
  C2 --> C2a["SLR (Simple LR) Parser"]
  C2 --> C2b["Canonical LR(1) Parser"]
  C2 --> C2c[LALR Parser]
  C1 --> C3[Operator Precedence Parser]
  C1 --> C4[Handle Pruning Parser]

  %% Annotations for Conceptual Understanding
  B1a:::note --> Z1[Allows backtracking to try alternatives]
  B2a:::note --> Z2[Uses single-token lookahead and parsing table]
  C2b:::note --> Z3[Most powerful LR parser with full lookahead]
  C2c:::note --> Z4[Compromise between power and efficiency; used in YACC]
  C3:::note --> Z5[Works only for operator grammars with no ambiguities]
  

1.2 Why is a Parser Used?

A parser is used to:

Real-world scenario: Imagine an online tax form. You can only submit it if all fields follow a specific format. The parser in a compiler ensures "code submission" is in the correct format so it doesn't crash the system during execution.

1.3 Advantages and Disadvantages of Parsers

1.3.1 Advantages
1.3.2 Disadvantages
Aspect Top-Down Parsing Bottom-Up Parsing
Direction From root to leaves From leaves to root
Common Parsers Recursive-descent, LL(1) LR(0), SLR, LALR
Use in Industry Hand-written parsers Tool-generated parsers (e.g., Yacc)

Example (Expression Grammar):


E → E + T | T
T → T * F | F
F → ( E ) | id

This grammar defines how arithmetic expressions are parsed into meaningful operations.

1.4 Architecture of a Parser

The parser is a core component of the compiler, sitting between the lexical analyzer and the semantic analyzer. Its architecture ensures structured validation and interpretation of the input tokens according to the grammar of the language.

1.4.1 Key Components

1.4.2 Functional Flow

As tokens flow into the parser, they are matched against grammar rules. If valid, a node is added to the parse tree. Otherwise, the error handler manages the syntax error based on recovery techniques like panic mode or phrase-level correction.

1.4.3 Mermaid Diagram: Parser Architecture

graph TD
  A[Source Code] --> B[Lexical Analyzer]
  B --> C[Tokens Stream]
  C --> D[Parser]

  D --> E1[Parse Tree / Syntax Tree]
  D --> E2[Error Handler]

  E1 --> F[Semantic Analyzer]

  %% Feedback Loop
  E2 -->|Syntax Errors| D

  %% Extra Notes
  classDef block fill:#f9f,stroke:#333,stroke-width:2px;
  class A,B,C,D,E1,E2,F block;

2. Top-Down Parsers

Top-down parsers begin constructing the parse tree from the start symbol (root) and proceed by expanding the leftmost nonterminal until they match the entire input string.

They simulate a leftmost derivation and are used in tools like syntax checkers, compilers, and interpreters where early error detection and simple parsing logic are desirable.

2.1 Recursive Descent Parsing

This is a top-down parser built from a set of mutually recursive procedures, where each procedure implements one of the nonterminals of the grammar.

It is conceptually similar to reading a storybook where each chapter calls another until the story ends or a contradiction (error) arises.

2.1.1 Characteristics
2.1.2 Example

Consider the grammar:


S → c A d  
A → a b | a

Recursive descent approach for this:

void S() {
    if (lookahead == 'c') {
        match('c'); A(); match('d');
    } else error();
}

void A() {
    if (lookahead == 'a') {
        match('a');
        if (lookahead == 'b') match('b'); // for A → a b
        // else match only a
    } else error();
}

2.2 Predictive Parsing (LL(1))

Predictive parsers are a special case of recursive-descent parsers that do not require backtracking.

They use lookahead (typically 1 symbol) and a parsing table to choose the correct production rule based on the current nonterminal and lookahead symbol.

2.2.1 Table-Driven Algorithm

Uses an explicit stack and parsing table:


Input: string w and parsing table M
Output: leftmost derivation or error

initialize stack with [$, S]
while (stack not empty):
    X = top of stack
    a = current input symbol
    if X == a:
        pop stack, advance input
    else if X is terminal:
        error()
    else if M[X, a] == X → Y1Y2…Yk:
        pop stack
        push Yk…Y1 onto stack
    else:
        error()
2.2.2 Example Grammar

E  → T E'
E' → + T E' | ε
T  → F T'
T' → * F T' | ε
F  → (E) | id

Parsing id + id * id results in:


Stack       Input          Action
E$          id+id*id$      E → T E'
T E'$       id+id*id$      T → F T'
F T' E'$    id+id*id$      F → id
id T' E'$   id+id*id$      match id
T' E'$      +id*id$        T' → ε
E'$         +id*id$        E' → + T E'
+ T E'$     +id*id$        match +
T E'$       id*id$         ...

2.3 Advantages and Disadvantages

2.3.1 Advantages
2.3.2 Disadvantages

2.4 Real-World Scenario

Consider voice-command interfaces like "Turn on the fan and light." The parser needs to understand command structure using grammar like:


Cmd → Action Object
Action → turn on | switch on
Object → fan | light | fan and light

A top-down parser allows us to quickly construct this interpretation with clear rules and minimal ambiguity.

3. Bottom-Up Parsers

Bottom-up parsers build the parse tree from leaves (tokens) toward the root (start symbol). Instead of predicting what to do next, they reduce recognized patterns to non-terminals — this is called Shift-Reduce Parsing.

They simulate a rightmost derivation in reverse and are widely used in parser generators like Yacc, Bison, etc.

3.1 Shift-Reduce Parsing

Shift-reduce parsing is the foundation of all bottom-up parsers. It operates using a stack and two basic operations:

3.1.1 Handle

A handle is a substring matching the right-hand side of a production that can be reduced. Think of it as the most immediate piece of a sentence that can be "rewritten" in reverse using grammar rules.

3.1.2 Example

Grammar:
E → E + T | T
T → T * F | F
F → (E) | id

Input: id * id

Reductions:
id * id 
→ F * id (F → id)
→ T * id (T → F)
→ T * F   (F → id)
→ T       (T → T * F)
→ E       (E → T)

This reverse derivation builds the parse tree bottom-up.

3.2 LR Parsers (Left-to-right, Rightmost derivation)

LR parsers are powerful bottom-up parsers. They decide when to shift or reduce based on states and lookahead tokens.

3.2.1 Types of LR Parsers
3.2.2 Components
3.2.3 Parsing Table Actions

Stack          Input          Action
0              id*id$         shift 5
0 5            *id$           reduce F → id
0 3            *id$           reduce T → F
0 2            *id$           shift 7
0 2 7          id$            shift 5
0 2 7 5        $              reduce F → id
0 2 7 3        $              reduce T → T * F
0 2           $               reduce T → T * F
0              $              reduce E → T
0 1            $              accept

This is the working of a shift-reduce LR parser.

3.3 Advantages and Disadvantages

3.3.1 Advantages
3.3.2 Disadvantages

3.4 Real-World Scenario

Imagine a barcode scanner in a supermarket. Every scanned symbol adds to the queue (shift). Once the pattern matches a known product (handle), it reduces to that product. Bottom-up parsing is like recognizing objects from pieces as they are assembled.