Union Find in Data Structures

1. Prerequisites

Understanding Union-Find requires familiarity with fundamental concepts from graph theory and data structures.

1.1 Graph Theory

Connected Components: Understanding how nodes in a graph can be grouped together.
Spanning Trees: A subset of edges connecting all nodes without cycles.
Cycle Detection: Identifying cycles in undirected graphs.

1.2 Basic Data Structures

Arrays: Storing and accessing elements.
Trees: Union-Find operates like a forest of trees.
Recursion & Path Compression: Optimizing lookups by flattening tree structures.

2. What is Union-Find?

Union-Find (Disjoint Set Union - DSU) is a data structure that maintains disjoint sets, allowing efficient union and find operations.

2.1 Internal Representation

Parent Array: Stores the leader (root) of each element's set.
Find(x): Recursively finds the root representative of the set.
Union(x, y): Merges two sets by linking their roots.

2.2 Optimized Union-Find

Path Compression: Shortens the tree depth by pointing all nodes directly to the root.
Union by Rank: Balances tree depth by merging smaller trees under larger ones.

2.3 Code Implementation


class UnionFind {
private:
    vector<int> parent, rank;
public:
    UnionFind(int n) {
        parent.resize(n);
        rank.resize(n, 0);
        for (int i = 0; i < n; i++) parent[i] = i;
    }

    int Find(int x) {
        if (parent[x] != x)
            parent[x] = Find(parent[x]); // Path compression
        return parent[x];
    }

    void Union(int x, int y) {
        int rootX = Find(x);
        int rootY = Find(y);
        if (rootX != rootY) {
            if (rank[rootX] > rank[rootY])
                parent[rootY] = rootX;
            else if (rank[rootX] < rank[rootY])
                parent[rootX] = rootY;
            else {
                parent[rootY] = rootX;
                rank[rootX]++;
            }
        }
    }
};

Time Complexity: O(α(n)) for both Find(x) and Union(x, y).

3. Why Does This Algorithm Exist?

Union-Find is designed for problems requiring dynamic connectivity maintenance.

3.1 Graph Algorithms

Kruskal’s Algorithm: Finds Minimum Spanning Trees (MST).
Connected Components: Identifies groups of connected nodes.
Cycle Detection: Detects cycles in an undirected graph.

3.2 Network & System Applications

Computer Networks: Checking if two computers belong to the same subnet.
Image Processing: Used in Connected Component Labeling (CCL).
Social Networks: Finding connected groups in a network.

4. When Should You Use It?

When checking whether two elements belong to the same group.
When merging sets dynamically.
For dynamic connectivity in graph problems.
When implementing Kruskal’s Algorithm for MST.
When frequent connected component queries are needed.

5. How Does It Compare to Alternatives?

5.1 Comparison with Other Methods

Feature	Union-Find (DSU)	DFS/BFS (Graph Traversal)	Adjacency List/Matrix
Use Case	Dynamic connectivity	Static traversal	Static connectivity
Find Complexity	O(α(n))	O(n)	O(1)
Union Complexity	O(α(n))	O(n)	O(1)
Space Complexity	O(n)	O(n)	O(n²) (for matrix)
Best for?	Dynamic merging & queries	Single traversal problems	Precomputed graph structures

5.2 Strengths

✅ Efficient for dynamic connectivity problems.
✅ Handles large datasets efficiently.
✅ Near constant time operations due to path compression.

5.3 Weaknesses

❌ Not suitable for shortest path or exhaustive searches.
❌ Requires extra space for parent and rank arrays.
❌ Does not maintain an explicit adjacency structure.

6. Basic Implementation

The following implementation of Union-Find is written in C++ with support for path compression and union by rank.


#include <iostream>
#include <vector>

using namespace std;

class UnionFind {
private:
    vector<int> parent, rank;

public:
    // Constructor: Initializes n elements with each as its own parent
    UnionFind(int n) {
        parent.resize(n);
        rank.resize(n, 0);
        for (int i = 0; i < n; i++) parent[i] = i;
    }

    // Find operation with path compression
    int Find(int x) {
        if (parent[x] != x)
            parent[x] = Find(parent[x]); // Path compression
        return parent[x];
    }

    // Union operation with rank optimization
    void Union(int x, int y) {
        int rootX = Find(x);
        int rootY = Find(y);
        if (rootX != rootY) {
            if (rank[rootX] > rank[rootY])
                parent[rootY] = rootX;
            else if (rank[rootX] < rank[rootY])
                parent[rootX] = rootY;
            else {
                parent[rootY] = rootX;
                rank[rootX]++;
            }
        }
    }

    // Display parent array
    void printParent() {
        cout << "Parent Array: ";
        for (int i = 0; i < parent.size(); i++)
            cout << parent[i] << " ";
        cout << endl;
    }
};

int main() {
    UnionFind uf(6); // Create 6 elements (0 to 5)

    // Perform some union operations
    uf.Union(0, 1);
    uf.printParent(); // Show parent array after merging (0,1)

    uf.Union(1, 2);
    uf.printParent(); // Show parent array after merging (1,2)

    uf.Union(3, 4);
    uf.printParent(); // Show parent array after merging (3,4)

    uf.Union(2, 3);
    uf.printParent(); // Show parent array after merging (2,3)

    // Find representative elements
    cout << "Find(4) -> " << uf.Find(4) << endl;
    cout << "Find(3) -> " << uf.Find(3) << endl;
    uf.printParent(); // Parent array after path compression

    return 0;
}

7. Dry Run

Let's analyze the execution step by step with a small input set.

7.1 Initial State

Each element is its own parent: parent = [0, 1, 2, 3, 4, 5]
Rank array (all zeros): rank = [0, 0, 0, 0, 0, 0]

7.2 Union(0, 1)

Find(0) returns 0 (root), Find(1) returns 1 (root).
Since ranks are equal, we make 0 the parent of 1 and increase its rank.
New state: parent = [0, 0, 2, 3, 4, 5], rank = [1, 0, 0, 0, 0, 0]

7.3 Union(1, 2)

Find(1) -> 0 (via path compression), Find(2) -> 2.
Union(0, 2): Since rank(0) > rank(2), make 0 the parent of 2.
New state: parent = [0, 0, 0, 3, 4, 5], rank = [1, 0, 0, 0, 0, 0]

7.4 Union(3, 4)

Find(3) -> 3, Find(4) -> 4.
Merge into one set: parent = [0, 0, 0, 3, 3, 5], rank = [1, 0, 0, 1, 0, 0]

7.5 Union(2, 3)

Find(2) -> 0, Find(3) -> 3.
Since rank(0) = rank(3), merge 3 under 0 and increase rank(0).
New state: parent = [0, 0, 0, 0, 3, 5], rank = [2, 0, 0, 1, 0, 0]

7.6 Find(4) and Find(3)

Find(4) -> Find(3) -> Find(0) -> Returns 0.
Path compression updates parent = [0, 0, 0, 0, 0, 5].
Find(3) now directly returns 0.

Final Parent Array

parent = [0, 0, 0, 0, 0, 5] (all elements except 5 belong to one set)

8. Key Observations

Path compression reduces tree height, leading to nearly constant-time Find operations.
Union by rank prevents deep trees, keeping operations efficient.
After a sequence of operations, most elements directly link to their root.

9. Union-Find: Time & Space Complexity Analysis

Union-Find's efficiency depends on the techniques used: naive implementation vs. optimized with path compression and union by rank. We analyze its worst-case, best-case, and average-case time complexities.

10. Time Complexity Analysis

10.1 Naive Union-Find (Without Optimization)

Find(x): O(n) → If the structure forms a linked list, Find requires traversing all elements.
Union(x, y): O(n) → The worst case happens when merging two deep trees.

10.2 Optimized Union-Find (With Path Compression & Union by Rank)

Find(x): O(α(n)) → Nearly constant time, as path compression flattens the tree.
Union(x, y): O(α(n)) → Almost constant time due to rank-based merging.

10.3 Worst-Case Complexity

In the worst case, without optimizations, operations take O(n) time. However, with path compression and union by rank:

The time complexity per operation is:

$$O(\alpha(n))$$

where $α(n)$ is the inverse Ackermann function, which grows extremely slowly. For practical input sizes, $α(n) \leq 5$, meaning operations run in near constant time.

10.4 Best-Case Complexity

When sets are already connected, Find(x) completes in O(1).
For a balanced tree with few operations, Union(x, y) runs in O(1).

10.5 Average-Case Complexity

Across multiple operations, the amortized time per operation remains:

$$O(\alpha(n))$$

11. Space Complexity Analysis

11.1 Space Complexity Breakdown

Parent Array: O(n) → Stores each node’s parent.
Rank Array: O(n) → Stores the rank (height) of each tree.
Total Space Complexity: O(n)

11.2 Space Consumption Growth

As input size n increases, the space requirement increases linearly due to array storage. However, compared to adjacency matrices (O(n²)), Union-Find is much more space-efficient.

12. Trade-offs

12.1 Strengths

✅ Near Constant Time with path compression and union by rank.
✅ Efficient for Dynamic Connectivity Problems like Kruskal’s algorithm.
✅ Low Space Overhead compared to adjacency matrices.

12.2 Weaknesses

❌ Not suitable for shortest paths (Dijkstra’s/BFS is better).
❌ Additional storage for rank/parent arrays (O(n) space).

13. Union-Find: Optimizations & Variants

To make Union-Find efficient, we apply two key optimizations: path compression and union by rank/size. These optimizations reduce the time complexity of operations to nearly constant time, O(α(n)), where α(n) is the inverse Ackermann function.

14. Common Optimizations

14.1 Path Compression

Goal: Reduce tree height by making each node directly point to the root.

Effect: Speeds up future Find() operations.


int Find(int x) {
    if (parent[x] != x)
        parent[x] = Find(parent[x]); // Path compression
    return parent[x];
}

✅ Ensures that subsequent queries return in O(1) for already visited elements.
✅ Flattens the structure dynamically over time.

14.2 Union by Rank

Goal: Attach the smaller tree under the larger tree to keep tree depth minimal.

Effect: Prevents formation of deep trees, keeping operations efficient.


void Union(int x, int y) {
    int rootX = Find(x);
    int rootY = Find(y);
    if (rootX != rootY) {
        if (rank[rootX] > rank[rootY])
            parent[rootY] = rootX;
        else if (rank[rootX] < rank[rootY])
            parent[rootX] = rootY;
        else {
            parent[rootY] = rootX;
            rank[rootX]++;
        }
    }
}

✅ Prevents worst-case O(n) behavior.
✅ Ensures that the tree remains balanced.

14.3 Union by Size

Alternative to Union by Rank: Instead of tracking rank (tree depth), track the size of each set.

Effect: Helps maintain smaller trees by merging smaller sets into larger ones.


void Union(int x, int y) {
    int rootX = Find(x);
    int rootY = Find(y);
    if (rootX != rootY) {
        if (size[rootX] > size[rootY]) {
            parent[rootY] = rootX;
            size[rootX] += size[rootY];
        } else {
            parent[rootX] = rootY;
            size[rootY] += size[rootX];
        }
    }
}

15. Variants of Union-Find

15.1 Quick-Find (Eager Approach)

Find(x): O(1) (Direct lookup)
Union(x, y): O(n) (Updating all connected elements)

Each element directly stores its set ID, but merging two sets requires updating all elements, making it inefficient.

15.2 Quick-Union (Lazy Approach)

Find(x): O(n) (Traverses up to the root)
Union(x, y): O(n) (Merges by linking root nodes)

Uses a tree structure where each node points to its parent, but without path compression, it can form deep trees.

15.3 Optimized Union-Find (Path Compression + Union by Rank)

Find(x): O(α(n)) (Flattened tree structure)
Union(x, y): O(α(n)) (Balanced merging)

Combining path compression and union by rank gives the best performance, making all operations nearly constant time.

16. Iterative vs. Recursive Implementations

16.1 Recursive Find with Path Compression


int Find(int x) {
    if (parent[x] != x)
        parent[x] = Find(parent[x]); // Recursive compression
    return parent[x];
}

✅ Simple and concise.
❌ May cause stack overflow for deep trees (not an issue with path compression).

16.2 Iterative Find (Stackless Approach)


int Find(int x) {
    while (x != parent[x]) {
        parent[x] = parent[parent[x]]; // Path compression (iterative)
        x = parent[x];
    }
    return x;
}

✅ Avoids recursion overhead.
✅ Works better for large datasets.

16.3 Comparison

Implementation	Pros	Cons
Recursive Find	Short, easy to implement	Stack overflow for deep trees
Iterative Find	No recursion overhead, better for large data	Requires extra logic

17. Quick Recap

🔹 Path Compression and Union by Rank make Union-Find nearly constant time.
🔹 Union by Size is an alternative with similar efficiency.
🔹 Recursive Find is simple but can cause stack overflow; iterative is better for large datasets.
🔹 Optimized Union-Find is widely used in graph problems like Kruskal’s MST and connectivity queries.

18. Union-Find: Edge Cases & Failure Handling

While Union-Find is efficient, improper usage can lead to incorrect results or inefficiencies. Understanding edge cases helps avoid common pitfalls.

19. Common Edge Cases & Pitfalls

19.1 Self-Union (Union of an Element with Itself)

Issue: Merging an element with itself is redundant and unnecessary.
Solution: Add a check before performing Union(x, x).


void Union(int x, int y) {
    if (x == y) return; // No need to merge
    int rootX = Find(x);
    int rootY = Find(y);
    if (rootX != rootY) parent[rootY] = rootX;
}

19.2 Union of Already Connected Elements

Issue: If x and y are already in the same set, unnecessary operations may degrade performance.
Solution: Check if Find(x) == Find(y) before performing Union().

19.3 Handling Large Datasets

Issue: Large inputs can cause stack overflow in recursive Find() implementations.
Solution: Use an iterative version of Find() to prevent excessive recursion depth.

19.4 Non-Existent Elements

Issue: Attempting to union or find elements outside the valid range.
Solution: Add boundary checks.


if (x < 0 || x >= parent.size() || y < 0 || y >= parent.size()) {
    throw invalid_argument("Invalid element index");
}

19.5 Uninitialized Union-Find Structure

Issue: If the Union-Find structure is not initialized, calling Find() leads to undefined behavior.
Solution: Ensure initialization before performing operations.

20. Test Cases to Verify Correctness

20.1 Test Case: Basic Union & Find


UnionFind uf(5);
uf.Union(0, 1);
assert(uf.Find(0) == uf.Find(1));

20.2 Test Case: Multiple Unions


uf.Union(1, 2);
uf.Union(3, 4);
assert(uf.Find(0) == uf.Find(2));
assert(uf.Find(3) != uf.Find(0));

20.3 Test Case: Path Compression Efficiency


uf.Union(0, 3);
assert(uf.Find(3) == uf.Find(0)); // Ensures path compression worked

20.4 Test Case: Edge Cases

Trying to find an element that is out of bounds.
Union of the same element.
Union of already connected components.

20.5 Test Case: Large Dataset Performance


UnionFind uf(1000000); // Large dataset
uf.Union(1, 500000);
uf.Union(500000, 999999);
assert(uf.Find(1) == uf.Find(999999)); // Ensures efficiency

21. Real-World Failure Scenarios

21.1 Network Partitioning Failure

Scenario: A network is partitioned incorrectly due to incomplete union operations.
Impact: Can cause incorrect routing, failing distributed systems.
Fix: Use redundancy and multiple rounds of merging.

21.2 Incorrect Cycle Detection

Scenario: A cycle detection algorithm in a graph fails due to improper Union-Find implementation.
Impact: Results in incorrect cycle identification in graphs.
Fix: Always check Find(x) == Find(y) before merging.

21.3 Data Corruption in Large Systems

Scenario: Path compression is not applied correctly, leading to incorrect component merging.
Impact: Inconsistent cluster memberships in distributed databases.
Fix: Regular integrity checks and revalidations.

🚀 Quick Recap: Edge cases and failure handling are crucial to making Union-Find robust. Proper testing ensures correctness and efficiency in real-world applications.

22. Union-Find: Real-World Applications & Industry Use Cases

Union-Find is widely used in applications requiring efficient merging and querying of disjoint sets. It plays a key role in various domains, including networking, distributed systems, and computational geometry.

23. Real-World Applications

23.1 Graph Algorithms

Minimum Spanning Tree (MST): Used in Kruskal’s algorithm to connect all nodes with minimal edge cost.
Connected Components: Determines whether two nodes belong to the same component in a graph.
Cycle Detection: Checks for cycles in undirected graphs, ensuring acyclic structures in network designs.

23.2 Networking & Distributed Systems

Dynamic Connectivity: Monitors real-time connectivity between nodes in a distributed system.
Union-Find in Routing Algorithms: Ensures efficient management of network connections in large-scale systems like the Internet.

23.3 Image Processing

Connected Component Labeling (CCL): Identifies and labels connected regions in binary images for object detection.

23.4 Social Networks & Clustering

Friendship Networks: Determines whether two users are in the same friend circle.
Community Detection: Identifies clusters in social networks.

23.5 Version Control Systems

Git Merge Algorithms: Uses Union-Find to track changes and detect conflicts in codebases.

23.6 Biology & Computational Geometry

DNA Sequence Clustering: Groups similar DNA sequences in bioinformatics.
Spatial Clustering: Used in computational geometry to find connected regions.

24. Open-Source Implementations

Several open-source projects use Union-Find:

Boost C++ Graph Library (BGL): Implements Union-Find for connected components and MST.
NetworkX (Python): Uses Union-Find in graph algorithms.
TensorFlow: Applies Union-Find in image segmentation and clustering tasks.

25. Practical Project: Finding Clusters in a Social Network

This Python script simulates a social network where people can form groups and check if they belong to the same group.


class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # Path compression
        return self.parent[x]

    def union(self, x, y):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX != rootY:
            if self.rank[rootX] > self.rank[rootY]:
                self.parent[rootY] = rootX
            elif self.rank[rootX] < self.rank[rootY]:
                self.parent[rootX] = rootY
            else:
                self.parent[rootY] = rootX
                self.rank[rootX] += 1

    def connected(self, x, y):
        return self.find(x) == self.find(y)

# Example usage
n = 6  # Number of users
uf = UnionFind(n)

# Establish friendships
uf.union(0, 1)
uf.union(2, 3)
uf.union(1, 2)  # Merges all three users into the same group

print(uf.connected(0, 3))  # True, since 0,1,2,3 are in the same set
print(uf.connected(4, 5))  # False, since 4 and 5 are not connected

26. Quick Recap

Union-Find is critical for applications requiring efficient dynamic connectivity queries.
It powers graph algorithms, social network clustering, and distributed systems.
Real-world projects like Git, TensorFlow, and network management leverage its efficiency.
Optimized implementations with path compression and union by rank make it scalable for large datasets.

27. Union-Find: Competitive Programming & System Design Integration

Union-Find (Disjoint Set Union - DSU) is a key data structure used in competitive programming and system design. Practicing it under time constraints ensures quick recall and implementation efficiency.

28. Assignment 1: Solve 10 Problems Using Union-Find

Practice solving diverse problems to solidify your understanding of Union-Find.

28.1 Beginner Problems

Find Connected Components in an undirected graph (Leetcode 323).
Cycle Detection in an undirected graph (GeeksforGeeks).
Friend Circle Problem: Find the number of friend circles in a matrix (Leetcode 547).
Connecting Cities with Minimum Cost: Use Kruskal’s MST (Leetcode 1135).
Checking if a Graph is a Tree (Union-Find approach).

28.2 Intermediate Problems

Number of Provinces: Find isolated groups in a connectivity matrix (Leetcode 547).
Accounts Merge: Merge accounts based on email ownership (Leetcode 721).
Largest Island: Find the largest connected component after flipping a zero (Leetcode 827).
Union-Find with Rollback: Implement a rollback mechanism for undo operations.
Dynamic Connectivity Queries: Maintain dynamic connected components efficiently.

Bonus: Solve a problem under a 5-minute implementation constraint to simulate real contest conditions.

29. Assignment 2: Use Union-Find in a System Design Problem

29.1 Problem Statement

Design a large-scale social networking system that efficiently handles dynamic friend connections and groups.

29.2 Requirements

Efficiently check if two users are in the same friend group.
Support dynamic friendship merging.
Scale to millions of users.

29.3 Solution Approach

Implement Union-Find to track friend groups:


class SocialNetwork:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, user):
        if self.parent[user] != user:
            self.parent[user] = self.find(self.parent[user])  # Path compression
        return self.parent[user]

    def add_friendship(self, user1, user2):
        root1 = self.find(user1)
        root2 = self.find(user2)
        if root1 != root2:
            if self.rank[root1] > self.rank[root2]:
                self.parent[root2] = root1
            elif self.rank[root1] < self.rank[root2]:
                self.parent[root1] = root2
            else:
                self.parent[root2] = root1
                self.rank[root1] += 1

    def are_friends(self, user1, user2):
        return self.find(user1) == self.find(user2)

# Example Usage:
network = SocialNetwork(1000)  # 1000 users
network.add_friendship(1, 2)
network.add_friendship(2, 3)
print(network.are_friends(1, 3))  # True

29.4 System Scaling Considerations

Use sharding: Partition users across multiple Union-Find instances.
Use caching: Store frequently accessed groups in memory.
Optimize persistence: Store parent relationships in a distributed database.

30. Assignment 3: Practice Implementing Union-Find Under Time Constraints

30.1 Goal

Implement Union-Find in under 5 minutes.
Write an optimized version with path compression and union by rank.
Use a timer to track performance.

30.2 Timer-Based Challenge


#include <iostream>
#include <vector>
using namespace std;

class UnionFind {
    vector parent, rank;
public:
    UnionFind(int n) {
        parent.resize(n);
        rank.resize(n, 0);
        for (int i = 0; i < n; i++) parent[i] = i;
    }

    int Find(int x) {
        if (parent[x] != x) parent[x] = Find(parent[x]);
        return parent[x];
    }

    void Union(int x, int y) {
        int rootX = Find(x);
        int rootY = Find(y);
        if (rootX != rootY) {
            if (rank[rootX] > rank[rootY]) parent[rootY] = rootX;
            else if (rank[rootX] < rank[rootY]) parent[rootX] = rootY;
            else { parent[rootY] = rootX; rank[rootX]++; }
        }
    }
};

int main() {
    UnionFind uf(10);
    uf.Union(1, 2);
    uf.Union(2, 3);
    cout << "1 and 3 connected? " << (uf.Find(1) == uf.Find(3)) << endl;
    return 0;
}

30.3 Performance Tracking

Use an online IDE or a competitive programming platform.
Try implementing Union-Find in multiple languages (C++, Python, Java).
Repeat the implementation until it consistently takes less than 5 minutes.

🚀 Key Takeaways:

Solving diverse problems enhances your Union-Find skills.
Applying Union-Find in system design helps understand scalability.
Time-constrained implementation builds confidence for contests and interviews.