Parse Tree - CSU086 | Shoolini University

Parse Tree

1. Parse Trees

A parse tree is a hierarchical tree structure that represents how a string conforms to a given grammar. It is used in syntax analysis to check whether a given input string belongs to a language.

1.1 Properties of a Parse Tree

2. Example 1: Arithmetic Expression Parsing

Given the following CFG for arithmetic expressions:


E → E + T | E - T | T
T → T * F | T / F | F
F → (E) | id
  

This grammar ensures that * and / have higher precedence than + and -, and that the operators + and - are left-associative.

2.1 Leftmost Derivation for id + id * id

We perform a step-by-step leftmost derivation, always replacing the leftmost nonterminal:

Step 1: Start with E

E
  
Step 2: Expand the leftmost E using E → E + T

E ⇒ E + T
  
Step 3: Replace the leftmost E (in E + T) using E → T

E + T ⇒ T + T
  
Step 4: Expand the leftmost T using T → F

T + T ⇒ F + T
  
Step 5: Replace F with an identifier (F → id)

F + T ⇒ id + T
  
Step 6: Expand the remaining T (after the +) using T → T * F

id + T ⇒ id + T * F
  
Step 7: Replace the leftmost T in T * F using T → F

id + T * F ⇒ id + F * F
  
Step 8: Replace each F with id (F → id)

id + F * F ⇒ id + id * F ⇒ id + id * id
  

This leftmost derivation confirms that multiplication is performed before addition.

2.2 Parse Tree Representation (Leftmost Derivation)

The parse tree visually captures the hierarchy of operations. Since multiplication has higher precedence than addition, the subtree for id * id is built first.

Step 1: Begin with the root node

Start with the start symbol E:

graph TD; E;
Step 2: Expand E → E + T

This expansion introduces the addition operator:

graph TD; E --> E_left[E]; E --> plus["\+"]; E --> T_right[T];
Step 3: Expand the left E as E → T and then T → F with F → id

This gives the left operand:

graph TD; E_left --> T_left[T]; T_left --> F_left[F]; F_left --> id1["id"];
Step 4: Expand the right T using T → T * F

This introduces the multiplication operator for the right operand:

graph TD; T_right --> T_mult[T]; T_right --> mult["\*"]; T_right --> F_right[F];
Step 5: Expand the left T in the multiplication to T → F with F → id

This gives the left operand of multiplication:

graph TD; T_mult --> F_mult[F]; F_mult --> id2["id"];
Step 6: Replace the right F with id

This gives the right operand of multiplication:

graph TD; F_right --> id3["id"];

The complete parse tree confirms that id + id * id is grouped as id + (id * id), reflecting correct operator precedence.

2.3 Rightmost Derivation for id + id * id

In a rightmost derivation we always replace the rightmost nonterminal first:

Step 1: Start with E

E
  
Step 2: Expand the rightmost nonterminal in E using E → E + T

E ⇒ E + T
  
Step 3: Expand the rightmost nonterminal (now T) using T → T * F

E + T ⇒ E + T * F
  
Step 4: Replace the rightmost nonterminal (F) with id

E + T * F ⇒ E + T * id
  
Step 5: Expand the rightmost nonterminal in T * id (the T) using T → F

E + T * id ⇒ E + F * id
  
Step 6: Replace the rightmost nonterminal (F) with id

E + F * id ⇒ E + id * id
  
Step 7: Now, the only remaining nonterminal is the leftmost E in E + id * id. Replace it using E → T

E + id * id ⇒ T + id * id
  
Step 8: Expand the remaining T using T → F

T + id * id ⇒ F + id * id
  
Step 9: Replace F with id

F + id * id ⇒ id + id * id
  

This rightmost derivation also leads to the expression id + id * id, with the multiplication carried out before addition.

2.4 Parse Tree for Rightmost Derivation

The following parse tree illustrates the derivation when the rightmost nonterminal is replaced first:

Step 1: Start with the root node
graph TD; E;
Step 2: Expand E → E + T
graph TD; E --> E_left[E]; E --> plus["\+"]; E --> T_right[T];
Step 3: Expand the rightmost T using T → T * F
graph TD; T_right --> T_mult[T]; T_right --> mult["\*"]; T_right --> F_right[F];
Step 4: Replace the rightmost F with id
graph TD; F_right --> id3["id"];
Step 5: Expand T_mult using T → F and then F → id
graph TD; T_mult --> F_mult[F]; F_mult --> id2["id"];
Step 6: Finally, replace the remaining nonterminal in the left branch (E_left) by applying E → T, then T → F, and F → id
graph TD; E_left --> T_left[T]; T_left --> F_left[F]; F_left --> id1["id"];

The complete parse tree again shows that the expression is parsed as id + (id * id), confirming the correct associativity and precedence.

Quick Recap

Summary of Arithmetic Expression Parsing

The parsing steps demonstrated above correctly follow the CFG:


E → E + T | E - T | T
T → T * F | T / F | F
F → (E) | id
  

The fact that both the leftmost and rightmost derivations yield the same final parse tree indicates that the grammar is unambiguous. Here, although there are different derivation orders (leftmost versus rightmost), they both lead to the same unique tree structure for the expression id + (id * id).

graph TD; E["E"] E --> E_left["E"] E --> plus["\+"] E --> T_right["T"] E_left --> T_left["T"] T_left --> F_left["F"] F_left --> id1["id"] T_right --> T_mult["T"] T_right --> mult["\*"] T_right --> F_right["F"] T_mult --> F_mult["F"] F_mult --> id2["id"] F_right --> id3["id"]

The parse tree derived in section 2.2 and 2.4 confirms that id + id * id is grouped as id + (id * id) — ensuring that multiplication is evaluated before addition.

Explanation:

3. Leftmost and Rightmost Derivations

A parse tree corresponds to both leftmost and rightmost derivations. These derivations determine the order in which nonterminals are expanded.

3.1 Leftmost Derivation


E → E + T
  → T + T
  → F + T
  → id + T
  → id + T * F
  → id + id * F
  → id + id * id

3.2 Rightmost Derivation


E → E + T
  → E + T * F
  → E + T * id
  → E + F * id
  → E + id * id
  → F + id * id
  → id + id * id

4. Example 2: Ambiguous Grammar

A grammar is ambiguous if there exists more than one valid parse tree for the same input string.

Consider this ambiguous grammar:


E → E + E | E * E | id

For the input string id + id * id, we get two possible parse trees:

4.1 Correct Parse Tree (Multiplication Precedence)

Step-by-Step Derivation (Correct Interpretation: Multiplication has higher precedence than addition)

We begin with the input string: id + id * id


1. Start with E:
   E → E + E   (Choose rule for addition)

2. Expand the second E (right operand of +) as multiplication has higher precedence:
   E → E + (E * E)  (Choose rule for multiplication first)

3. Expand the operands:
   E → id + (id * id)

4. Verify that multiplication is performed first due to its higher precedence.

Thus, the parse tree follows the correct operator precedence.

4.2 Incorrect Parse Tree (Addition Precedence)

Step-by-Step Derivation (Incorrect Interpretation: Addition has higher precedence than multiplication)

We begin with the same input string: id + id * id


1. Start with E:
   E → E * E   (Incorrect choice of multiplication first)

2. Expand the left operand first:
   E → (E + E) * E   (Addition is handled before multiplication, which is wrong)

3. Expand operands:
   E → (id + id) * id

4. This results in incorrect evaluation since addition should not be performed before multiplication.

This parse tree incorrectly prioritizes addition over multiplication, violating standard arithmetic rules.

To remove this ambiguity, we modify the grammar to enforce operator precedence.

4.3 Resolving the Ambiguity

To enforce correct precedence, we modify the grammar:


E → E + T | T
T → T * F | F
F → id

Now, multiplication has higher precedence because:

With this modification, we ensure that id + id * id is always parsed correctly as id + (id * id).

5. Example 3: Nested Expressions with Parentheses

Parentheses change the evaluation order of expressions. Consider the following grammar:


E → (E) | E + E | E * E | id

For the input (id + id) * id, the parse tree looks like:

Explanation:

6. Constructing Parse Trees Using Code

We can use Python's nltk library to generate parse trees for a given grammar.


from nltk import CFG
from nltk.parse.chart import ChartParser

# Define a simple grammar
grammar = CFG.fromstring("""
    S -> NP VP
    NP -> 'John' | 'Mary' | 'dogs'
    VP -> V NP | V
    V -> 'barks' | 'sees'
""")

# Create a parser
parser = ChartParser(grammar)

# Parse the sentence
sentence = "John sees dogs".split()
for tree in parser.parse(sentence):
    print(tree)
    tree.pretty_print()

This code constructs a parse tree for the sentence "John sees dogs" based on the defined CFG.

7. Conclusion

Parse trees are essential for understanding the structure of expressions and programming languages. They provide a hierarchical view of syntax, guiding both parsing and code generation.