Parse Tree

1. Parse Trees

A parse tree is a hierarchical tree structure that represents how a string conforms to a given grammar. It is used in syntax analysis to check whether a given input string belongs to a language.

1.1 Properties of a Parse Tree

Root: The topmost node represents the start symbol of the grammar.
Leaves: The terminal symbols of the grammar appear at the leaves.
Internal Nodes: Each internal node represents a nonterminal, which expands based on grammar rules.
Children Order: If a nonterminal expands into a sequence of symbols, they appear as its children from left to right.
Yield: The string obtained by reading all the leaves from left to right is called the yield or frontier of the parse tree.

2. Example 1: Arithmetic Expression Parsing

Given the following CFG for arithmetic expressions:


E → E + T | E - T | T
T → T * F | T / F | F
F → (E) | id

This grammar ensures that * and / have higher precedence than + and -, and that the operators + and - are left-associative.

2.1 Leftmost Derivation for `id + id * id`

We perform a step-by-step leftmost derivation, always replacing the leftmost nonterminal:

Step 1: Start with `E`

Step 2: Expand the leftmost `E` using `E → E + T`


E ⇒ E + T

Step 3: Replace the leftmost `E` (in `E + T`) using `E → T`


E + T ⇒ T + T

Step 4: Expand the leftmost `T` using `T → F`


T + T ⇒ F + T

Step 5: Replace `F` with an identifier (`F → id`)


F + T ⇒ id + T

Step 6: Expand the remaining `T` (after the `+`) using `T → T * F`


id + T ⇒ id + T * F

Step 7: Replace the leftmost `T` in `T * F` using `T → F`


id + T * F ⇒ id + F * F

Step 8: Replace each `F` with `id` (`F → id`)


id + F * F ⇒ id + id * F ⇒ id + id * id

This leftmost derivation confirms that multiplication is performed before addition.

2.2 Parse Tree Representation (Leftmost Derivation)

The parse tree visually captures the hierarchy of operations. Since multiplication has higher precedence than addition, the subtree for id * id is built first.

Step 1: Begin with the root node

Start with the start symbol E:

graph TD; E;

Step 2: Expand `E → E + T`

This expansion introduces the addition operator:

graph TD; E --> E_left[E]; E --> plus["\+"]; E --> T_right[T];

Step 3: Expand the left `E` as `E → T` and then `T → F` with `F → id`

This gives the left operand:

graph TD; E_left --> T_left[T]; T_left --> F_left[F]; F_left --> id1["id"];

Step 4: Expand the right `T` using `T → T * F`

This introduces the multiplication operator for the right operand:

graph TD; T_right --> T_mult[T]; T_right --> mult["\*"]; T_right --> F_right[F];

Step 5: Expand the left `T` in the multiplication to `T → F` with `F → id`

This gives the left operand of multiplication:

graph TD; T_mult --> F_mult[F]; F_mult --> id2["id"];

Step 6: Replace the right `F` with `id`

This gives the right operand of multiplication:

graph TD; F_right --> id3["id"];

The complete parse tree confirms that id + id * id is grouped as id + (id * id), reflecting correct operator precedence.

2.3 Rightmost Derivation for `id + id * id`

In a rightmost derivation we always replace the rightmost nonterminal first:

Step 1: Start with `E`

Step 2: Expand the rightmost nonterminal in `E` using `E → E + T`


E ⇒ E + T

Step 3: Expand the rightmost nonterminal (now `T`) using `T → T * F`


E + T ⇒ E + T * F

Step 4: Replace the rightmost nonterminal (`F`) with `id`


E + T * F ⇒ E + T * id

Step 5: Expand the rightmost nonterminal in `T * id` (the `T`) using `T → F`


E + T * id ⇒ E + F * id

Step 6: Replace the rightmost nonterminal (`F`) with `id`


E + F * id ⇒ E + id * id

Step 7: Now, the only remaining nonterminal is the leftmost `E` in `E + id * id`. Replace it using `E → T`


E + id * id ⇒ T + id * id

Step 8: Expand the remaining `T` using `T → F`


T + id * id ⇒ F + id * id

Step 9: Replace `F` with `id`


F + id * id ⇒ id + id * id

This rightmost derivation also leads to the expression id + id * id, with the multiplication carried out before addition.

2.4 Parse Tree for Rightmost Derivation

The following parse tree illustrates the derivation when the rightmost nonterminal is replaced first:

Step 1: Start with the root node

graph TD; E;

Step 2: Expand `E → E + T`

graph TD; E --> E_left[E]; E --> plus["\+"]; E --> T_right[T];

Step 3: Expand the rightmost `T` using `T → T * F`

graph TD; T_right --> T_mult[T]; T_right --> mult["\*"]; T_right --> F_right[F];

Step 4: Replace the rightmost `F` with `id`

graph TD; F_right --> id3["id"];

Step 5: Expand `T_mult` using `T → F` and then `F → id`

graph TD; T_mult --> F_mult[F]; F_mult --> id2["id"];

Step 6: Finally, replace the remaining nonterminal in the left branch (`E_left`) by applying `E → T`, then `T → F`, and `F → id`

graph TD; E_left --> T_left[T]; T_left --> F_left[F]; F_left --> id1["id"];

The complete parse tree again shows that the expression is parsed as id + (id * id), confirming the correct associativity and precedence.

Quick Recap

Leftmost Derivation: Always replaces the leftmost nonterminal first.
Rightmost Derivation: Always replaces the rightmost nonterminal first.
Parse Trees: Both derivations lead to the same overall parse tree.
Correct Associativity: The tree ensures that multiplication (with higher precedence) is performed before addition.

Summary of Arithmetic Expression Parsing

The parsing steps demonstrated above correctly follow the CFG:


E → E + T | E - T | T
T → T * F | T / F | F
F → (E) | id

The fact that both the leftmost and rightmost derivations yield the same final parse tree indicates that the grammar is unambiguous. Here, although there are different derivation orders (leftmost versus rightmost), they both lead to the same unique tree structure for the expression id + (id * id).

graph TD; E["E"] E --> E_left["E"] E --> plus["\+"] E --> T_right["T"] E_left --> T_left["T"] T_left --> F_left["F"] F_left --> id1["id"] T_right --> T_mult["T"] T_right --> mult["\*"] T_right --> F_right["F"] T_mult --> F_mult["F"] F_mult --> id2["id"] F_right --> id3["id"]

The parse tree derived in section 2.2 and 2.4 confirms that id + id * id is grouped as id + (id * id) — ensuring that multiplication is evaluated before addition.

Explanation:

The root node E represents the entire expression.
Since addition (+) has lower precedence than multiplication (*), the expression is parsed as id + (id * id).
The subtree for id * id is evaluated first, reflecting the operator precedence.

3. Leftmost and Rightmost Derivations

A parse tree corresponds to both leftmost and rightmost derivations. These derivations determine the order in which nonterminals are expanded.

3.1 Leftmost Derivation


E → E + T
  → T + T
  → F + T
  → id + T
  → id + T * F
  → id + id * F
  → id + id * id

3.2 Rightmost Derivation


E → E + T
  → E + T * F
  → E + T * id
  → E + F * id
  → E + id * id
  → F + id * id
  → id + id * id

4. Example 2: Ambiguous Grammar

A grammar is ambiguous if there exists more than one valid parse tree for the same input string.

Consider this ambiguous grammar:


E → E + E | E * E | id

For the input string id + id * id, we get two possible parse trees:

4.1 Correct Parse Tree (Multiplication Precedence)

Step-by-Step Derivation (Correct Interpretation: Multiplication has higher precedence than addition)

We begin with the input string: id + id * id


1. Start with E:
   E → E + E   (Choose rule for addition)

2. Expand the second E (right operand of +) as multiplication has higher precedence:
   E → E + (E * E)  (Choose rule for multiplication first)

3. Expand the operands:
   E → id + (id * id)

4. Verify that multiplication is performed first due to its higher precedence.

Thus, the parse tree follows the correct operator precedence.

4.2 Incorrect Parse Tree (Addition Precedence)

Step-by-Step Derivation (Incorrect Interpretation: Addition has higher precedence than multiplication)

We begin with the same input string: id + id * id


1. Start with E:
   E → E * E   (Incorrect choice of multiplication first)

2. Expand the left operand first:
   E → (E + E) * E   (Addition is handled before multiplication, which is wrong)

3. Expand operands:
   E → (id + id) * id

4. This results in incorrect evaluation since addition should not be performed before multiplication.

This parse tree incorrectly prioritizes addition over multiplication, violating standard arithmetic rules.

To remove this ambiguity, we modify the grammar to enforce operator precedence.

4.3 Resolving the Ambiguity

To enforce correct precedence, we modify the grammar:


E → E + T | T
T → T * F | F
F → id

Now, multiplication has higher precedence because:

T handles * before passing results to E.
E only processes + after resolving all * operations.

With this modification, we ensure that id + id * id is always parsed correctly as id + (id * id).

5. Example 3: Nested Expressions with Parentheses

Parentheses change the evaluation order of expressions. Consider the following grammar:


E → (E) | E + E | E * E | id

For the input (id + id) * id, the parse tree looks like:

Explanation:

Parentheses enforce that id + id is evaluated before multiplication.
The subtree (id + id) is encapsulated inside F, ensuring correct precedence.

6. Constructing Parse Trees Using Code

We can use Python's nltk library to generate parse trees for a given grammar.


from nltk import CFG
from nltk.parse.chart import ChartParser

# Define a simple grammar
grammar = CFG.fromstring("""
    S -> NP VP
    NP -> 'John' | 'Mary' | 'dogs'
    VP -> V NP | V
    V -> 'barks' | 'sees'
""")

# Create a parser
parser = ChartParser(grammar)

# Parse the sentence
sentence = "John sees dogs".split()
for tree in parser.parse(sentence):
    print(tree)
    tree.pretty_print()

This code constructs a parse tree for the sentence "John sees dogs" based on the defined CFG.

7. Conclusion

Parse trees are essential for understanding the structure of expressions and programming languages. They provide a hierarchical view of syntax, guiding both parsing and code generation.

Leftmost and Rightmost Derivations: Define how trees are built step-by-step.
Ambiguity Resolution: Ensures a single unique parse tree exists for each input.
Operator Precedence: Determines the order of operations.
Practical Applications: Parsing in compilers, natural language processing, and query evaluation.

Parse Tree

1. Parse Trees

1.1 Properties of a Parse Tree

2. Example 1: Arithmetic Expression Parsing

2.1 Leftmost Derivation for id + id * id

Step 1: Start with E

Step 2: Expand the leftmost E using E → E + T

Step 3: Replace the leftmost E (in E + T) using E → T

Step 4: Expand the leftmost T using T → F

Step 5: Replace F with an identifier (F → id)

Step 6: Expand the remaining T (after the +) using T → T * F

Step 7: Replace the leftmost T in T * F using T → F

Step 8: Replace each F with id (F → id)

2.2 Parse Tree Representation (Leftmost Derivation)

Step 1: Begin with the root node

Step 2: Expand E → E + T

Step 3: Expand the left E as E → T and then T → F with F → id

Step 4: Expand the right T using T → T * F

Step 5: Expand the left T in the multiplication to T → F with F → id

Step 6: Replace the right F with id

2.3 Rightmost Derivation for id + id * id

Step 1: Start with E

Step 2: Expand the rightmost nonterminal in E using E → E + T

Step 3: Expand the rightmost nonterminal (now T) using T → T * F

Step 4: Replace the rightmost nonterminal (F) with id

Step 5: Expand the rightmost nonterminal in T * id (the T) using T → F

Step 6: Replace the rightmost nonterminal (F) with id

Step 7: Now, the only remaining nonterminal is the leftmost E in E + id * id. Replace it using E → T

Step 8: Expand the remaining T using T → F

Step 9: Replace F with id

2.4 Parse Tree for Rightmost Derivation

Step 1: Start with the root node

Step 2: Expand E → E + T

Step 3: Expand the rightmost T using T → T * F

Step 4: Replace the rightmost F with id

Step 5: Expand T_mult using T → F and then F → id

Step 6: Finally, replace the remaining nonterminal in the left branch (E_left) by applying E → T, then T → F, and F → id

Quick Recap

Summary of Arithmetic Expression Parsing

3. Leftmost and Rightmost Derivations

3.1 Leftmost Derivation

3.2 Rightmost Derivation

4. Example 2: Ambiguous Grammar

4.1 Correct Parse Tree (Multiplication Precedence)

Step-by-Step Derivation (Correct Interpretation: Multiplication has higher precedence than addition)

4.2 Incorrect Parse Tree (Addition Precedence)

Step-by-Step Derivation (Incorrect Interpretation: Addition has higher precedence than multiplication)

4.3 Resolving the Ambiguity

5. Example 3: Nested Expressions with Parentheses

6. Constructing Parse Trees Using Code

7. Conclusion

2.1 Leftmost Derivation for `id + id * id`

Step 1: Start with `E`

Step 2: Expand the leftmost `E` using `E → E + T`

Step 3: Replace the leftmost `E` (in `E + T`) using `E → T`

Step 4: Expand the leftmost `T` using `T → F`

Step 5: Replace `F` with an identifier (`F → id`)

Step 6: Expand the remaining `T` (after the `+`) using `T → T * F`

Step 7: Replace the leftmost `T` in `T * F` using `T → F`

Step 8: Replace each `F` with `id` (`F → id`)

Step 2: Expand `E → E + T`

Step 3: Expand the left `E` as `E → T` and then `T → F` with `F → id`

Step 4: Expand the right `T` using `T → T * F`

Step 5: Expand the left `T` in the multiplication to `T → F` with `F → id`

Step 6: Replace the right `F` with `id`

2.3 Rightmost Derivation for `id + id * id`

Step 1: Start with `E`

Step 2: Expand the rightmost nonterminal in `E` using `E → E + T`

Step 3: Expand the rightmost nonterminal (now `T`) using `T → T * F`

Step 4: Replace the rightmost nonterminal (`F`) with `id`

Step 5: Expand the rightmost nonterminal in `T * id` (the `T`) using `T → F`

Step 6: Replace the rightmost nonterminal (`F`) with `id`

Step 7: Now, the only remaining nonterminal is the leftmost `E` in `E + id * id`. Replace it using `E → T`

Step 8: Expand the remaining `T` using `T → F`

Step 9: Replace `F` with `id`

Step 2: Expand `E → E + T`

Step 3: Expand the rightmost `T` using `T → T * F`

Step 4: Replace the rightmost `F` with `id`

Step 5: Expand `T_mult` using `T → F` and then `F → id`

Step 6: Finally, replace the remaining nonterminal in the left branch (`E_left`) by applying `E → T`, then `T → F`, and `F → id`