Union-Find in Data Structures - CSU083 | Shoolini University

Union Find in Data Structures

1. Prerequisites

Understanding Union-Find requires familiarity with fundamental concepts from graph theory and data structures.

1.1 Graph Theory

1.2 Basic Data Structures

2. What is Union-Find?

Union-Find (Disjoint Set Union - DSU) is a data structure that maintains disjoint sets, allowing efficient union and find operations.

2.1 Internal Representation

2.2 Optimized Union-Find

2.3 Code Implementation


class UnionFind {
private:
    vector<int> parent, rank;
public:
    UnionFind(int n) {
        parent.resize(n);
        rank.resize(n, 0);
        for (int i = 0; i < n; i++) parent[i] = i;
    }

    int Find(int x) {
        if (parent[x] != x)
            parent[x] = Find(parent[x]); // Path compression
        return parent[x];
    }

    void Union(int x, int y) {
        int rootX = Find(x);
        int rootY = Find(y);
        if (rootX != rootY) {
            if (rank[rootX] > rank[rootY])
                parent[rootY] = rootX;
            else if (rank[rootX] < rank[rootY])
                parent[rootX] = rootY;
            else {
                parent[rootY] = rootX;
                rank[rootX]++;
            }
        }
    }
};

Time Complexity: O(α(n)) for both Find(x) and Union(x, y).

3. Why Does This Algorithm Exist?

Union-Find is designed for problems requiring dynamic connectivity maintenance.

3.1 Graph Algorithms

3.2 Network & System Applications

4. When Should You Use It?

5. How Does It Compare to Alternatives?

5.1 Comparison with Other Methods

Feature Union-Find (DSU) DFS/BFS (Graph Traversal) Adjacency List/Matrix
Use Case Dynamic connectivity Static traversal Static connectivity
Find Complexity O(α(n)) O(n) O(1)
Union Complexity O(α(n)) O(n) O(1)
Space Complexity O(n) O(n) O(n²) (for matrix)
Best for? Dynamic merging & queries Single traversal problems Precomputed graph structures

5.2 Strengths

5.3 Weaknesses

6. Basic Implementation

The following implementation of Union-Find is written in C++ with support for path compression and union by rank.


#include <iostream>
#include <vector>

using namespace std;

class UnionFind {
private:
    vector<int> parent, rank;

public:
    // Constructor: Initializes n elements with each as its own parent
    UnionFind(int n) {
        parent.resize(n);
        rank.resize(n, 0);
        for (int i = 0; i < n; i++) parent[i] = i;
    }

    // Find operation with path compression
    int Find(int x) {
        if (parent[x] != x)
            parent[x] = Find(parent[x]); // Path compression
        return parent[x];
    }

    // Union operation with rank optimization
    void Union(int x, int y) {
        int rootX = Find(x);
        int rootY = Find(y);
        if (rootX != rootY) {
            if (rank[rootX] > rank[rootY])
                parent[rootY] = rootX;
            else if (rank[rootX] < rank[rootY])
                parent[rootX] = rootY;
            else {
                parent[rootY] = rootX;
                rank[rootX]++;
            }
        }
    }

    // Display parent array
    void printParent() {
        cout << "Parent Array: ";
        for (int i = 0; i < parent.size(); i++)
            cout << parent[i] << " ";
        cout << endl;
    }
};

int main() {
    UnionFind uf(6); // Create 6 elements (0 to 5)

    // Perform some union operations
    uf.Union(0, 1);
    uf.printParent(); // Show parent array after merging (0,1)

    uf.Union(1, 2);
    uf.printParent(); // Show parent array after merging (1,2)

    uf.Union(3, 4);
    uf.printParent(); // Show parent array after merging (3,4)

    uf.Union(2, 3);
    uf.printParent(); // Show parent array after merging (2,3)

    // Find representative elements
    cout << "Find(4) -> " << uf.Find(4) << endl;
    cout << "Find(3) -> " << uf.Find(3) << endl;
    uf.printParent(); // Parent array after path compression

    return 0;
}

7. Dry Run

Let's analyze the execution step by step with a small input set.

7.1 Initial State

7.2 Union(0, 1)

7.3 Union(1, 2)

7.4 Union(3, 4)

7.5 Union(2, 3)

7.6 Find(4) and Find(3)

Final Parent Array

parent = [0, 0, 0, 0, 0, 5] (all elements except 5 belong to one set)

8. Key Observations

9. Union-Find: Time & Space Complexity Analysis

Union-Find's efficiency depends on the techniques used: naive implementation vs. optimized with path compression and union by rank. We analyze its worst-case, best-case, and average-case time complexities.

10. Time Complexity Analysis

10.1 Naive Union-Find (Without Optimization)

10.2 Optimized Union-Find (With Path Compression & Union by Rank)

10.3 Worst-Case Complexity

In the worst case, without optimizations, operations take O(n) time. However, with path compression and union by rank:

The time complexity per operation is:

$$O(\alpha(n))$$

where $α(n)$ is the inverse Ackermann function, which grows extremely slowly. For practical input sizes, $α(n) \leq 5$, meaning operations run in near constant time.

10.4 Best-Case Complexity

10.5 Average-Case Complexity

Across multiple operations, the amortized time per operation remains:

$$O(\alpha(n))$$

11. Space Complexity Analysis

11.1 Space Complexity Breakdown

11.2 Space Consumption Growth

As input size n increases, the space requirement increases linearly due to array storage. However, compared to adjacency matrices (O(n²)), Union-Find is much more space-efficient.

12. Trade-offs

12.1 Strengths

12.2 Weaknesses

13. Union-Find: Optimizations & Variants

To make Union-Find efficient, we apply two key optimizations: path compression and union by rank/size. These optimizations reduce the time complexity of operations to nearly constant time, O(α(n)), where α(n) is the inverse Ackermann function.

14. Common Optimizations

14.1 Path Compression

Goal: Reduce tree height by making each node directly point to the root.

Effect: Speeds up future Find() operations.


int Find(int x) {
    if (parent[x] != x)
        parent[x] = Find(parent[x]); // Path compression
    return parent[x];
}

14.2 Union by Rank

Goal: Attach the smaller tree under the larger tree to keep tree depth minimal.

Effect: Prevents formation of deep trees, keeping operations efficient.


void Union(int x, int y) {
    int rootX = Find(x);
    int rootY = Find(y);
    if (rootX != rootY) {
        if (rank[rootX] > rank[rootY])
            parent[rootY] = rootX;
        else if (rank[rootX] < rank[rootY])
            parent[rootX] = rootY;
        else {
            parent[rootY] = rootX;
            rank[rootX]++;
        }
    }
}

14.3 Union by Size

Alternative to Union by Rank: Instead of tracking rank (tree depth), track the size of each set.

Effect: Helps maintain smaller trees by merging smaller sets into larger ones.


void Union(int x, int y) {
    int rootX = Find(x);
    int rootY = Find(y);
    if (rootX != rootY) {
        if (size[rootX] > size[rootY]) {
            parent[rootY] = rootX;
            size[rootX] += size[rootY];
        } else {
            parent[rootX] = rootY;
            size[rootY] += size[rootX];
        }
    }
}

15. Variants of Union-Find

15.1 Quick-Find (Eager Approach)

Each element directly stores its set ID, but merging two sets requires updating all elements, making it inefficient.

15.2 Quick-Union (Lazy Approach)

Uses a tree structure where each node points to its parent, but without path compression, it can form deep trees.

15.3 Optimized Union-Find (Path Compression + Union by Rank)

Combining path compression and union by rank gives the best performance, making all operations nearly constant time.

16. Iterative vs. Recursive Implementations

16.1 Recursive Find with Path Compression


int Find(int x) {
    if (parent[x] != x)
        parent[x] = Find(parent[x]); // Recursive compression
    return parent[x];
}

16.2 Iterative Find (Stackless Approach)


int Find(int x) {
    while (x != parent[x]) {
        parent[x] = parent[parent[x]]; // Path compression (iterative)
        x = parent[x];
    }
    return x;
}

16.3 Comparison

Implementation Pros Cons
Recursive Find Short, easy to implement Stack overflow for deep trees
Iterative Find No recursion overhead, better for large data Requires extra logic

17. Quick Recap

18. Union-Find: Edge Cases & Failure Handling

While Union-Find is efficient, improper usage can lead to incorrect results or inefficiencies. Understanding edge cases helps avoid common pitfalls.

19. Common Edge Cases & Pitfalls

19.1 Self-Union (Union of an Element with Itself)


void Union(int x, int y) {
    if (x == y) return; // No need to merge
    int rootX = Find(x);
    int rootY = Find(y);
    if (rootX != rootY) parent[rootY] = rootX;
}

19.2 Union of Already Connected Elements

19.3 Handling Large Datasets

19.4 Non-Existent Elements


if (x < 0 || x >= parent.size() || y < 0 || y >= parent.size()) {
    throw invalid_argument("Invalid element index");
}

19.5 Uninitialized Union-Find Structure

20. Test Cases to Verify Correctness

20.1 Test Case: Basic Union & Find


UnionFind uf(5);
uf.Union(0, 1);
assert(uf.Find(0) == uf.Find(1));

20.2 Test Case: Multiple Unions


uf.Union(1, 2);
uf.Union(3, 4);
assert(uf.Find(0) == uf.Find(2));
assert(uf.Find(3) != uf.Find(0));

20.3 Test Case: Path Compression Efficiency


uf.Union(0, 3);
assert(uf.Find(3) == uf.Find(0)); // Ensures path compression worked

20.4 Test Case: Edge Cases

20.5 Test Case: Large Dataset Performance


UnionFind uf(1000000); // Large dataset
uf.Union(1, 500000);
uf.Union(500000, 999999);
assert(uf.Find(1) == uf.Find(999999)); // Ensures efficiency

21. Real-World Failure Scenarios

21.1 Network Partitioning Failure

21.2 Incorrect Cycle Detection

21.3 Data Corruption in Large Systems

🚀 Quick Recap: Edge cases and failure handling are crucial to making Union-Find robust. Proper testing ensures correctness and efficiency in real-world applications.

22. Union-Find: Real-World Applications & Industry Use Cases

Union-Find is widely used in applications requiring efficient merging and querying of disjoint sets. It plays a key role in various domains, including networking, distributed systems, and computational geometry.

23. Real-World Applications

23.1 Graph Algorithms

23.2 Networking & Distributed Systems

23.3 Image Processing

23.4 Social Networks & Clustering

23.5 Version Control Systems

23.6 Biology & Computational Geometry

24. Open-Source Implementations

Several open-source projects use Union-Find:

25. Practical Project: Finding Clusters in a Social Network

This Python script simulates a social network where people can form groups and check if they belong to the same group.


class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # Path compression
        return self.parent[x]

    def union(self, x, y):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX != rootY:
            if self.rank[rootX] > self.rank[rootY]:
                self.parent[rootY] = rootX
            elif self.rank[rootX] < self.rank[rootY]:
                self.parent[rootX] = rootY
            else:
                self.parent[rootY] = rootX
                self.rank[rootX] += 1

    def connected(self, x, y):
        return self.find(x) == self.find(y)

# Example usage
n = 6  # Number of users
uf = UnionFind(n)

# Establish friendships
uf.union(0, 1)
uf.union(2, 3)
uf.union(1, 2)  # Merges all three users into the same group

print(uf.connected(0, 3))  # True, since 0,1,2,3 are in the same set
print(uf.connected(4, 5))  # False, since 4 and 5 are not connected

26. Quick Recap

27. Union-Find: Competitive Programming & System Design Integration

Union-Find (Disjoint Set Union - DSU) is a key data structure used in competitive programming and system design. Practicing it under time constraints ensures quick recall and implementation efficiency.

28. Assignment 1: Solve 10 Problems Using Union-Find

Practice solving diverse problems to solidify your understanding of Union-Find.

28.1 Beginner Problems

28.2 Intermediate Problems

Bonus: Solve a problem under a 5-minute implementation constraint to simulate real contest conditions.

29. Assignment 2: Use Union-Find in a System Design Problem

29.1 Problem Statement

Design a large-scale social networking system that efficiently handles dynamic friend connections and groups.

29.2 Requirements

29.3 Solution Approach

Implement Union-Find to track friend groups:


class SocialNetwork:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, user):
        if self.parent[user] != user:
            self.parent[user] = self.find(self.parent[user])  # Path compression
        return self.parent[user]

    def add_friendship(self, user1, user2):
        root1 = self.find(user1)
        root2 = self.find(user2)
        if root1 != root2:
            if self.rank[root1] > self.rank[root2]:
                self.parent[root2] = root1
            elif self.rank[root1] < self.rank[root2]:
                self.parent[root1] = root2
            else:
                self.parent[root2] = root1
                self.rank[root1] += 1

    def are_friends(self, user1, user2):
        return self.find(user1) == self.find(user2)

# Example Usage:
network = SocialNetwork(1000)  # 1000 users
network.add_friendship(1, 2)
network.add_friendship(2, 3)
print(network.are_friends(1, 3))  # True

29.4 System Scaling Considerations

30. Assignment 3: Practice Implementing Union-Find Under Time Constraints

30.1 Goal

30.2 Timer-Based Challenge


#include <iostream>
#include <vector>
using namespace std;

class UnionFind {
    vector parent, rank;
public:
    UnionFind(int n) {
        parent.resize(n);
        rank.resize(n, 0);
        for (int i = 0; i < n; i++) parent[i] = i;
    }

    int Find(int x) {
        if (parent[x] != x) parent[x] = Find(parent[x]);
        return parent[x];
    }

    void Union(int x, int y) {
        int rootX = Find(x);
        int rootY = Find(y);
        if (rootX != rootY) {
            if (rank[rootX] > rank[rootY]) parent[rootY] = rootX;
            else if (rank[rootX] < rank[rootY]) parent[rootX] = rootY;
            else { parent[rootY] = rootX; rank[rootX]++; }
        }
    }
};

int main() {
    UnionFind uf(10);
    uf.Union(1, 2);
    uf.Union(2, 3);
    cout << "1 and 3 connected? " << (uf.Find(1) == uf.Find(3)) << endl;
    return 0;
}

30.3 Performance Tracking

🚀 Key Takeaways: