Trees: Definition & Concept - CSU1051 - Data Structures & Algorithms

Basics of Tree's

Executive Summary: Mastering Trees in Data Structures

In this in-depth article, we dissect the enigma of Trees in Data Structures and Algorithms, taking you through the journey of understanding its profound concepts and operations. We start with a pressing problem statement, leveraging it as a launchpad to dive deep into the intrinsic definitions and functionalities of Trees. Grappling with the basic ideas to advanced, complex notions such as Binary Trees, Binary Search Trees, AVL Trees, and Red-Black Trees, we steer the reader's attention to sophisticated operations including insertion, deletion, and searching. Additionally, we unravel traversal techniques like Preorder, Inorder, and Postorder, coupled with the role of Recursive and Iterative methods. Moreover, we delve into advanced topics such as understanding the Balance Factor, Tree Rotations, and The Violations and Fixing methods for the Red-Black Tree. A sprinkle of well-chosen narratives will keep you riveted, enhancing the comprehension of these profound concepts. We conclude with a peek into B-Trees and B+ Trees, an advanced topic that we'll tackle in a forthcoming discourse. We have furnished C++ implementation methods wherever necessary, aiding practical understanding. This summary provides a rapid yet exhaustive review, equipping you with key takeaways, and propelling you to explore the engaging depths of this vast topic.

1. Problem Statement

Imagine being tasked with developing an efficient database system that needs to perform quick insertions, deletions, and fetch operations. Traditional array-based data structures might fall short due to their linear time complexity. An ideal solution would be employing a data structure that guarantees a logarithmic time complexity for these operations. This is where Trees come into play, offering a robust structure with an excellent balance between information storage and operational efficiency.

2. Trees: A Foreword

Trees are non-linear, hierarchical data structures that consist of nodes connected by edges, typically used when data has an inherent hierarchy. Each tree has a root node (no parent), and every other node has one parent and zero or more children, forming subtrees. The interconnected nodes symbolize relationships, while the edges depict the pathway or linkage between the data points. In computer science, the tree data structure, especially Binary Trees, is extensively used for efficient information storage and retrieval.

2.1 Binary Trees

Binary Trees are a type of tree data structure where each node can have at most two children, often distinguished as the left child and the right child. An interesting aspect is the height of the Binary Tree, defined as the longest path from the root node to any leaf node. By ensuring that the tree is 'balanced', i.e., the difference in the height of the left and right subtrees for any node is not more than one, we can optimize the efficiency of operations, achieving an O(log n) time complexity. Hence, a balanced Binary Tree is an ideal choice for our database system.

2.2 Binary Search Trees (BST)

Binary Search Trees (BST) are an enhancement over Binary Trees. They abide by the binary tree property, and an additional BST property: For any given node, the values of all the nodes in its left subtree are less, and the values of all the nodes in its right subtree are more. This property allows efficient searching for a value, thus making BST a favorite choice for dictionary problems and database indexing.

// A basic implementation of Binary Search Tree in C++
#include <bits/stdc++.h>
using namespace std;

struct Node {
   int key;
   Node* left, *right;
};

// A utility function to create a new BST Node
Node *newNode(int item) {
   Node *temp = new Node;
   temp->key = item;
   temp->left = temp->right = NULL;
   return temp;
}

// Function to perform inorder traversal of BST
void inorder(Node *root) {
   if (root != NULL) {
      inorder(root->left);
      cout << root->key << " ";
      inorder(root->right);
   }
}

// Function to insert a new Node in BST
Node* insert(Node* Node, int key) {
   if (Node == NULL) return newNode(key);
   if (key < Node->key)
      Node->left = insert(Node->left, key);
   else if (key > Node->key)
      Node->right = insert(Node->right, key);
   return Node;
}
2.2.1 Operations in BST

In a BST, operations like searching, insertion, and deletion are performed based on the BST property. For searching an element, we start at the root, and depending on whether the element is larger or smaller than the root, we move to the right or left child, respectively, and repeat until we find the element or reach a NULL. Insertion is performed similarly; when we reach a NULL, we create a new node and assign it to the corresponding left or right child pointer. Deletion is a bit more complex; if the node to be deleted has no child or one child, we can simply delete the node and attach its child, if any, to its parent. However, if the node has two children, we find its in-order predecessor (maximum value node in its left subtree) or in-order successor (minimum value node in its right subtree), replace the node with the predecessor/successor and delete the predecessor/successor.

2.3 AVL Trees

AVL Trees, named after inventors Adelson-Velsky and Landis, are self-balancing Binary Search Trees. An AVL Tree maintains the 'Balance Factor' (the difference in heights of the left and right subtrees) for every node to be -1, 0, or 1. This ensures that the tree remains approximately balanced, leading to O(log n) search time in average and worst cases, making it highly efficient for lookup operations. However, maintaining the balance of the tree requires rotations during insertions and deletions which can be computationally expensive.

2.4 Red-Black Trees

Red-Black Trees are another form of self-balancing Binary Search Trees. Each node in a Red-Black Tree contains an extra bit for denoting the color of the node, either red or black. The tree follows some pre-defined 'Red-Black Properties' which ensure the tree remains approximately balanced during insertions and deletions, thus providing an efficient solution for many data structure problems.

2.4.1 Violations and Fixing in Red-Black Trees

In a Red-Black Tree, any update operation (insertion or deletion) can violate the Red-Black properties. We can resolve these violations by performing certain operations, mainly recoloring and rotations. In the recoloring operation, we change the color of the node, which might shift the violation upwards in the tree, while in the rotation operation, we change the structure of the tree to balance it. These operations ensure the black depth of the tree remains constant throughout the tree, making it a great choice for database problems where the number of queries is large and changes to the database are minimal.

2.5 B-Trees

A B-tree is a self-balancing, multiway search tree where each node can contain more than one key and can have more than two children. The B-tree data structure is ideal for systems with large amounts of data and offers efficient insertion, deletion, and search operations.

Specifically, a B-Tree of order m satisfies the following properties:

  1. Every node has a maximum of m children.
  2. Every non-leaf node (except root) has a minimum of ⌈m/2⌉ children.
  3. The root has a minimum of two children if it is not a leaf node.
  4. A non-leaf node with k children contains k-1 keys.
  5. All keys of a node are sorted in increasing order. The key ki separates the sub-tree of ki-1 from the sub-tree of ki.
  6. All leaves appear in the same level and carry no information.
// A simple CPP implementation to demonstrate insert operation in a B-Tree
#include <iostream>
using namespace std;

// A B-Tree node
class BTreeNode {
    int *keys; // An array of keys
    int t; // Minimum degree (defines the range for number of keys)
    BTreeNode **C; // An array of child pointers
    int n; // Current number of keys
    bool leaf; // Is true when node is leaf. Otherwise false
public:
    BTreeNode(int _t, bool _leaf); // Constructor
    void insertNonFull(int k); // Inserts a new key in this node
    void splitChild(int i, BTreeNode *y); // Splits child y of this node
    void traverse(); // Function to traverse all nodes in a subtree rooted with this node
};

// Constructor for BTreeNode class
BTreeNode::BTreeNode(int t1, bool leaf1) {
    t = t1;
    leaf = leaf1;
    keys = new int[2*t-1];
    C = new BTreeNode *[2*t];
    n = 0;
}

2.6 B+ Trees

A B+ Tree is a type of B-Tree that maintains sorted data sequences, offering efficient insertions, deletions, and range queries. Unlike B-trees, B+ trees have an additional level of linked leaf nodes for easy access to the data records. All data records are stored at the leaf level of the B+ Tree, and all the internal nodes work as an ordered index to the leaf nodes.

Here are some properties of B+ trees:

  1. Internal nodes are used for indexing only. They contain copies of keys.
  2. All records reside at the leaf level.
  3. Leaf nodes are linked together to provide ordered access to the records.
// A simple CPP implementation to demonstrate insert operation in a B+ Tree
#include <iostream>
using namespace std;

class BPlusNode {
    int *keys;
    int t;
    BPlusNode **children;
    bool leaf;
public:
    BPlusNode(int _t, bool _leaf);
    void insertNonFull(int k);
    void splitChild(int i, BPlusNode *y);
    void traverse();
};

BPlusNode::B

PlusNode(int t1, bool leaf1) {
    t = t1;
    leaf = leaf1;
    keys = new int[2*t];
    children = new BPlusNode *[2*t+1];
}
2.6.1 Comparing B-Trees and B+ Trees

B-Trees and B+ Trees are both self-balancing tree data structures that maintain sorted data and allow for efficient operations. However, B+ Trees are more suitable for systems that read and write large blocks of data, such as databases and file systems, because they store all the records at the leaf level and provide a linked list structure for easy full-scans. In contrast, B-Trees store records at every level, leading to a more efficient "point" search operation.

On The Horizon: Charting the Uncharted with Tries and Suffix Trees

It's been an exhilarating journey through the world of Trees in Data Structures, exploring the different types and their unique characteristics. However, our journey is far from over. In the next article, we will delve into the fascinating realm of Tries and Suffix Trees. With their powerful capabilities for handling strings, these data structures open up a new frontier in our understanding. It promises to be an enlightening voyage into an often overlooked corner of Data Structures. So, stay tuned for more!