Self-Balancing Trees in Data Structures - CSU083 | Shoolini University

Self Balancing Trees

Self-Balancing Trees

Self-balancing trees are a class of binary search trees (BSTs) that automatically maintain their height close to optimal, ensuring efficient operations like search, insert, and delete.

1. Prerequisites

Before understanding self-balancing trees, you must grasp these foundational concepts:

2. What is a Self-Balancing Tree?

A self-balancing tree dynamically maintains a near-optimal height by applying balance operations during insertion and deletion. This prevents performance degradation due to skewed structures.

Popular self-balancing trees include:

3. Why Does This Algorithm Exist?

Self-balancing trees are used where fast and consistent performance is required:

4. When Should You Use a Self-Balancing Tree?

Use self-balancing trees when:

5. How Does It Compare to Alternatives?

5.1 Strengths

5.2 Weaknesses

6. AVL Tree Implementation (Python)

An AVL Tree is a self-balancing binary search tree where the height difference between the left and right subtrees (balance factor) is at most 1. Rotations (left, right, left-right, right-left) maintain balance.

class Node:
    def __init__(self, key):
        self.key = key
        self.left = None
        self.right = None
        self.height = 1  # New node is initially at height 1

class AVLTree:
    # Get height of a node
    def get_height(self, node):
        return node.height if node else 0

    # Get balance factor
    def get_balance(self, node):
        return self.get_height(node.left) - self.get_height(node.right) if node else 0

    # Right rotate
    def right_rotate(self, y):
        x = y.left
        T2 = x.right

        # Perform rotation
        x.right = y
        y.left = T2

        # Update heights
        y.height = max(self.get_height(y.left), self.get_height(y.right)) + 1
        x.height = max(self.get_height(x.left), self.get_height(x.right)) + 1

        return x  # New root

    # Left rotate
    def left_rotate(self, x):
        y = x.right
        T2 = y.left

        # Perform rotation
        y.left = x
        x.right = T2

        # Update heights
        x.height = max(self.get_height(x.left), self.get_height(x.right)) + 1
        y.height = max(self.get_height(y.left), self.get_height(y.right)) + 1

        return y  # New root

    # Insert a node
    def insert(self, root, key):
        # Step 1: Perform standard BST insert
        if not root:
            return Node(key)
        elif key < root.key:
            root.left = self.insert(root.left, key)
        else:
            root.right = self.insert(root.right, key)

        # Step 2: Update height
        root.height = max(self.get_height(root.left), self.get_height(root.right)) + 1

        # Step 3: Get balance factor and rebalance if needed
        balance = self.get_balance(root)

        # Left Heavy (Right Rotation)
        if balance > 1 and key < root.left.key:
            return self.right_rotate(root)

        # Right Heavy (Left Rotation)
        if balance < -1 and key > root.right.key:
            return self.left_rotate(root)

        # Left-Right Case (Left Rotate then Right Rotate)
        if balance > 1 and key > root.left.key:
            root.left = self.left_rotate(root.left)
            return self.right_rotate(root)

        # Right-Left Case (Right Rotate then Left Rotate)
        if balance < -1 and key < root.right.key:
            root.right = self.right_rotate(root.right)
            return self.left_rotate(root)

        return root  # Return unchanged node

7. Dry Run on Sample Input

Insert sequence: 30, 20, 40, 10, 25, 35, 50

Step-by-step execution:

7.1 Insert 30

7.2 Insert 20

7.3 Insert 40

7.4 Insert 10

7.5 Insert 25

7.6 Insert 35

7.7 Insert 50

Final AVL Tree:


        20
      /   \
    10     30
          /   \
        25     40
              /   \
            35     50

8. Variable Tracking During Execution

Step Node Inserted Root Balance Factor Rotation
1 30 30 0 None
2 20 30 1 None
3 40 30 0 None
4 10 30 → 20 2 (Unbalanced) Right Rotation
5 25 20 1 None
6 35 20 0 None
7 50 20 0 None

9. Quick Recap

10. Time & Space Complexity Analysis

Self-balancing trees ensure that the height remains logarithmic, maintaining efficient operations. Let n be the number of nodes in the tree.

11. Time Complexity Analysis

11.1 Search Complexity

In a Binary Search Tree (BST), search complexity depends on tree height. For an unbalanced BST, it can degrade to O(n), but in a self-balancing tree:

11.2 Insertion Complexity

Insertion consists of BST insertion (O(log n)) and rebalancing operations.

11.3 Deletion Complexity

Deletion consists of three parts:

Thus, the total deletion complexity remains O(log n).

11.4 Rotation Complexity

Rotations (left, right, left-right, right-left) are constant-time operations (O(1)), but the number of rotations in insertion or deletion is at most O(log n).

11.5 Overall Complexity Summary

Operation Best Case Average Case Worst Case
Search O(1) O(log n) O(log n)
Insert O(log n) O(log n) O(log n)
Delete O(log n) O(log n) O(log n)
Rotation O(1) O(1) O(1)

12. Space Complexity Analysis

Space complexity depends on the number of nodes stored and auxiliary space for balancing.

12.1 Space Used by Nodes

12.2 Auxiliary Space for Recursive Calls

12.3 Overall Space Complexity Summary

Component Space Complexity
Tree Storage O(n)
Recursive Call Stack O(log n)
Total O(n)

13. Trade-offs and Practical Considerations

13.1 Strengths

13.2 Weaknesses

13.3 When to Use?

14. Optimizations & Variants

Self-balancing trees ensure O(log n) operations by maintaining height balance through rotations. Optimizations improve efficiency, and different tree variants exist based on application needs.

15. Common Optimizations

15.1 Optimizing Rotations

15.2 Optimizing Height Updates

15.3 Efficient Memory Usage

15.4 Cache-Aware Implementations

16. Different Variants of Self-Balancing Trees

Several self-balancing tree variants exist, each optimized for specific use cases.

16.1 AVL Tree

16.2 Red-Black Tree

16.3 B-Trees

16.4 Splay Tree

16.5 Treap (Tree + Heap)

16.6 Weight-Balanced Tree

17. Iterative vs. Recursive Implementations

17.1 Recursive Implementation

Most AVL and Red-Black tree implementations use recursion for insertions and deletions.

17.2 Iterative Implementation

Instead of recursion, an explicit stack or parent pointers handle traversal.

17.3 Performance Comparison

Implementation Time Complexity Space Complexity Ease of Use
Recursive O(log n) O(log n) (stack space) Simpler
Iterative O(log n) O(1) (no recursion stack) More complex

17.4 When to Use Which?

18. Key Takeaways

19. Common Edge Cases & Pitfalls

19.1 Edge Cases in Insertion

19.2 Edge Cases in Deletion

19.3 Rotations & Height Updates

19.4 Memory & Performance Issues

20. Test Cases to Verify Correctness

To ensure correctness, the following test cases should be executed.

20.1 Basic Functionality Tests

20.2 Edge Case Tests

Handle extreme scenarios.

20.3 Stress Tests

Simulate large-scale operations.

20.4 Automated Test Cases (Python)

import unittest

class TestAVLTree(unittest.TestCase):
    def setUp(self):
        self.tree = AVLTree()
        self.root = None

    def test_insert_single(self):
        self.root = self.tree.insert(self.root, 10)
        self.assertEqual(self.root.key, 10)

    def test_insert_sorted_order(self):
        for val in [10, 20, 30, 40, 50]:
            self.root = self.tree.insert(self.root, val)
        self.assertEqual(self.root.key, 30)  # Tree should be balanced

    def test_delete_leaf(self):
        self.root = self.tree.insert(self.root, 20)
        self.root = self.tree.insert(self.root, 10)
        self.root = self.tree.insert(self.root, 30)
        self.root = self.tree.delete(self.root, 10)
        self.assertIsNone(self.root.left)

    def test_delete_node_with_two_children(self):
        self.root = self.tree.insert(self.root, 50)
        self.root = self.tree.insert(self.root, 30)
        self.root = self.tree.insert(self.root, 70)
        self.root = self.tree.insert(self.root, 60)
        self.root = self.tree.insert(self.root, 80)
        self.root = self.tree.delete(self.root, 70)
        self.assertEqual(self.root.right.key, 80)  # Ensure correct replacement

if __name__ == "__main__":
    unittest.main()

21. Real-World Failure Scenarios

21.1 Database Index Corruption

Many databases use B-Trees and Red-Black Trees for indexing. Failure to rebalance after deletion leads to inefficient queries.

21.2 File System Errors

File systems using self-balancing trees (like ext4) may suffer from corruption if rotations are skipped.

21.3 Network Routing Failures

Routing tables use balanced trees for quick lookups. If balance maintenance fails, lookup speeds degrade significantly.

21.4 Memory Leaks in Long-Lived Applications

Improper memory management in tree-based caches (e.g., LRU caches using AVL Trees) can cause memory leaks.

22. Quick Recap

23. Real-World Applications of Self-Balancing Trees

23.1 Database Indexing (B-Trees, AVL Trees, Red-Black Trees)

23.2 File Systems (B-Trees, Red-Black Trees)

23.3 Networking & Routing Tables (Red-Black Trees, AVL Trees)

23.4 Memory Allocation (Red-Black Trees, AVL Trees)

23.5 Compiler Optimization & Symbol Tables (AVL Trees, Splay Trees)

23.6 Artificial Intelligence & Machine Learning (KD-Trees, Ball Trees)

24. Open-Source Implementations of Self-Balancing Trees

24.1 Linux Kernel (Red-Black Trees)

The Linux Kernel uses a Red-Black Tree for managing process scheduling.

Code reference: Linux RB-Tree Source Code

24.2 SQLite (B-Trees for Indexing)

SQLite uses a B-Tree structure to optimize indexing and storage efficiency.

Code reference: SQLite Source Code

24.3 C++ STL (Red-Black Tree in std::map & std::set)

The C++ Standard Library implements Red-Black Trees for ordered maps and sets.

Code reference: GCC STL Documentation

24.4 PostgreSQL (B-Trees for Query Optimization)

PostgreSQL databases rely on B-Trees for fast indexing and query execution.

Code reference: PostgreSQL Source Code

25. Practical Project: Implementing a Caching System with an AVL Tree

We will implement a simple in-memory caching system using an AVL Tree. This cache will support:

25.1 AVL Tree Cache Implementation (Python)

class Node:
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.left = None
        self.right = None
        self.height = 1  # New node starts at height 1

class AVLTreeCache:
    def get_height(self, node):
        return node.height if node else 0

    def get_balance(self, node):
        return self.get_height(node.left) - self.get_height(node.right) if node else 0

    def right_rotate(self, y):
        x = y.left
        T2 = x.right

        x.right = y
        y.left = T2

        y.height = max(self.get_height(y.left), self.get_height(y.right)) + 1
        x.height = max(self.get_height(x.left), self.get_height(x.right)) + 1

        return x

    def left_rotate(self, x):
        y = x.right
        T2 = y.left

        y.left = x
        x.right = T2

        x.height = max(self.get_height(x.left), self.get_height(x.right)) + 1
        y.height = max(self.get_height(y.left), self.get_height(y.right)) + 1

        return y

    def insert(self, root, key, value):
        if not root:
            return Node(key, value)
        elif key < root.key:
            root.left = self.insert(root.left, key, value)
        else:
            root.right = self.insert(root.right, key, value)

        root.height = max(self.get_height(root.left), self.get_height(root.right)) + 1
        balance = self.get_balance(root)

        if balance > 1 and key < root.left.key:
            return self.right_rotate(root)

        if balance < -1 and key > root.right.key:
            return self.left_rotate(root)

        if balance > 1 and key > root.left.key:
            root.left = self.left_rotate(root.left)
            return self.right_rotate(root)

        if balance < -1 and key < root.right.key:
            root.right = self.right_rotate(root.right)
            return self.left_rotate(root)

        return root

    def search(self, root, key):
        if not root or root.key == key:
            return root
        elif key < root.key:
            return self.search(root.left, key)
        else:
            return self.search(root.right, key)

# Usage Example
cache = AVLTreeCache()
root = None
root = cache.insert(root, "user:100", "John Doe")
root = cache.insert(root, "user:200", "Alice Smith")

# Search for user:100
result = cache.search(root, "user:100")
print(result.value if result else "Not found")  # Output: John Doe

25.2 Project Use Case

This AVL Tree Cache can be integrated into:

26. Quick Recap

27. Competitive Programming Assignments

Self-balancing trees are widely used in competitive programming and system design due to their guaranteed O(log n) operations. Solve the following problems using AVL Trees, Red-Black Trees, or Treaps.

27.1 Basic Problems (Warm-Up)

27.2 Intermediate Problems (Algorithmic Challenges)

27.3 Advanced Problems (Competitive Programming)

27.4 Resources for Problem Solving

28. System Design Assignment - Use Case Integration

Apply self-balancing trees in real-world system design problems.

28.1 System Design Problem: Implement a Scalable Rate Limiter

Design a distributed rate limiter for an API using self-balancing trees.

28.2 Problem Statement:

You are designing a rate limiter for an API with millions of requests per second. Each request has a unique user ID, and you must allow at most 100 requests per minute per user.

28.3 Solution Using Self-Balancing Trees:

28.4 Code Implementation (Python)

import time

class Node:
    def __init__(self, timestamp):
        self.key = timestamp
        self.left = None
        self.right = None
        self.height = 1

class RateLimiter:
    def __init__(self):
        self.root = None
        self.time_window = 60  # Time window in seconds
        self.max_requests = 100

    def get_height(self, node):
        return node.height if node else 0

    def get_balance(self, node):
        return self.get_height(node.left) - self.get_height(node.right) if node else 0

    def left_rotate(self, x):
        y = x.right
        T2 = y.left
        y.left = x
        x.right = T2
        x.height = max(self.get_height(x.left), self.get_height(x.right)) + 1
        y.height = max(self.get_height(y.left), self.get_height(y.right)) + 1
        return y

    def right_rotate(self, y):
        x = y.left
        T2 = x.right
        x.right = y
        y.left = T2
        y.height = max(self.get_height(y.left), self.get_height(y.right)) + 1
        x.height = max(self.get_height(x.left), self.get_height(x.right)) + 1
        return x

    def insert(self, root, key):
        if not root:
            return Node(key)
        elif key < root.key:
            root.left = self.insert(root.left, key)
        else:
            root.right = self.insert(root.right, key)

        root.height = max(self.get_height(root.left), self.get_height(root.right)) + 1
        balance = self.get_balance(root)

        if balance > 1 and key < root.left.key:
            return self.right_rotate(root)

        if balance < -1 and key > root.right.key:
            return self.left_rotate(root)

        if balance > 1 and key > root.left.key:
            root.left = self.left_rotate(root.left)
            return self.right_rotate(root)

        if balance < -1 and key < root.right.key:
            root.right = self.right_rotate(root.right)
            return self.left_rotate(root)

        return root

    def inorder(self, root):
        if not root:
            return []
        return self.inorder(root.left) + [root.key] + self.inorder(root.right)

    def enforce_rate_limit(self, user_id):
        current_time = int(time.time())
        self.root = self.insert(self.root, current_time)

        # Remove old timestamps
        valid_requests = [t for t in self.inorder(self.root) if t > current_time - self.time_window]
        if len(valid_requests) > self.max_requests:
            print("Rate limit exceeded for user:", user_id)
            return False
        print("Request allowed for user:", user_id)
        return True

# Example Usage
rate_limiter = RateLimiter()
for _ in range(105):  # Exceed the limit
    rate_limiter.enforce_rate_limit("user123")
    time.sleep(0.5)  # Simulate incoming requests

28.5 Real-World Application:

29. Time-Constrained Implementation Challenges

To master self-balancing trees under competitive conditions:

29.1 Timed Assignments

29.2 Competitive Coding Practice

30. Key Takeaway