Minimum Spanning Tree (Prim’s & Kruskal’s) - CSU083 | Shoolini University

Minimum Spanning Tree (Prim’s & Kruskal’s)

1. Prerequisites

Before learning about Minimum Spanning Trees (MST), ensure you understand the following concepts:

2. What is a Minimum Spanning Tree (MST)?

A Minimum Spanning Tree (MST) of a connected, weighted graph is a subset of edges that:

MST ensures that the network remains connected with minimal cost.

2.1 Prim’s Algorithm

Prim’s Algorithm constructs the MST by growing a single tree:


import heapq

def prims_mst(graph):
    n = len(graph)
    visited = [False] * n
    pq = [(0, 0)]  # (cost, vertex)
    mst_cost = 0

    while pq:
        cost, u = heapq.heappop(pq)
        if visited[u]: continue
        visited[u] = True
        mst_cost += cost
        
        for v, weight in graph[u]:
            if not visited[v]:
                heapq.heappush(pq, (weight, v))

    return mst_cost

2.2 Kruskal’s Algorithm

Kruskal’s Algorithm builds the MST by considering edges in ascending order of weight:


class DisjointSet:
    def __init__(self, n):
        self.parent = list(range(n))

    def find(self, u):
        if self.parent[u] != u:
            self.parent[u] = self.find(self.parent[u])
        return self.parent[u]

    def union(self, u, v):
        root1, root2 = self.find(u), self.find(v)
        if root1 != root2:
            self.parent[root1] = root2

def kruskals_mst(edges, n):
    edges.sort(key=lambda x: x[2])  # Sort edges by weight
    ds = DisjointSet(n)
    mst_cost = 0
    mst_edges = 0

    for u, v, weight in edges:
        if ds.find(u) != ds.find(v):
            ds.union(u, v)
            mst_cost += weight
            mst_edges += 1
            if mst_edges == n - 1:
                break

    return mst_cost

3. Why Does MST Exist?

MST is used in real-world scenarios where minimal connection costs are required:

4. When Should You Use It?

5. Comparison with Alternatives

5.1 Prim’s vs. Kruskal’s

Criteria Prim’s Algorithm Kruskal’s Algorithm
Approach Expands a tree from a single node. Builds MST edge by edge in increasing weight order.
Best for Dense graphs (many edges). Sparse graphs (few edges).
Data Structure Used Priority queue (Min-Heap). Disjoint Set (Union-Find).
Time Complexity \(O(E \log V)\) (with heap). \(O(E \log E)\) (with sorting and Union-Find).
Implementation Complexity More complex (heap operations). Simple to implement.

5.2 Alternative Approaches

  • Floyd-Warshall or Dijkstra’s Algorithm: Used for shortest path problems, not MST.
  • Bellman-Ford Algorithm: Detects negative-weight cycles but is not for MST.
  • Reverse-Delete Algorithm: Alternative to Kruskal’s but less efficient.

6. Basic Implementation

6.1 Prim’s Algorithm (Python Implementation)

Prim’s algorithm starts with an arbitrary node and grows the MST by adding the smallest edge at each step.


import heapq

def prims_mst(graph):
    n = len(graph)
    visited = [False] * n
    pq = [(0, 0)]  # (cost, vertex)
    mst_cost = 0
    mst_edges = []

    while pq:
        cost, u = heapq.heappop(pq)
        if visited[u]: 
            continue
        visited[u] = True
        mst_cost += cost

        for v, weight in graph[u]:
            if not visited[v]:
                heapq.heappush(pq, (weight, v))
                mst_edges.append((u, v, weight))

    return mst_cost, mst_edges

# Graph represented as adjacency list
graph = {
    0: [(1, 2), (3, 6)],
    1: [(0, 2), (2, 3), (3, 8), (4, 5)],
    2: [(1, 3), (4, 7)],
    3: [(0, 6), (1, 8)],
    4: [(1, 5), (2, 7)]
}

mst_cost, mst_edges = prims_mst(graph)
print("MST Cost:", mst_cost)
print("MST Edges:", mst_edges)

6.2 Kruskal’s Algorithm (Python Implementation)

Kruskal’s algorithm sorts all edges by weight and then adds them while avoiding cycles.


class DisjointSet:
    def __init__(self, n):
        self.parent = list(range(n))

    def find(self, u):
        if self.parent[u] != u:
            self.parent[u] = self.find(self.parent[u])
        return self.parent[u]

    def union(self, u, v):
        root1, root2 = self.find(u), self.find(v)
        if root1 != root2:
            self.parent[root1] = root2

def kruskals_mst(edges, n):
    edges.sort(key=lambda x: x[2])  # Sort edges by weight
    ds = DisjointSet(n)
    mst_cost = 0
    mst_edges = []

    for u, v, weight in edges:
        if ds.find(u) != ds.find(v):
            ds.union(u, v)
            mst_cost += weight
            mst_edges.append((u, v, weight))
            if len(mst_edges) == n - 1:
                break

    return mst_cost, mst_edges

# Graph represented as edge list
edges = [
    (0, 1, 2), (0, 3, 6),
    (1, 2, 3), (1, 3, 8), (1, 4, 5),
    (2, 4, 7),
    (3, 4, 9)
]

mst_cost, mst_edges = kruskals_mst(edges, 5)
print("MST Cost:", mst_cost)
print("MST Edges:", mst_edges)

7. Dry Run of the Algorithm (Step-by-Step Execution)

7.1 Dry Run of Prim’s Algorithm

Graph:

Edge Weight
(0,1) 2
(0,3) 6
(1,2) 3
(1,3) 8
(1,4) 5
(2,4) 7
(3,4) 9

Step-by-step Execution:

Step PQ (Priority Queue) Selected Edge Visited Nodes MST Edges
Start (0,0) - {0} -
1 (2,1), (6,3) (0,1) {0,1} {(0,1,2)}
2 (3,2), (5,4), (6,3), (8,3) (1,2) {0,1,2} {(0,1,2), (1,2,3)}
3 (5,4), (6,3), (8,3), (7,4) (1,4) {0,1,2,4} {(0,1,2), (1,2,3), (1,4,5)}
4 (6,3), (8,3), (7,4) (0,3) {0,1,2,4,3} {(0,1,2), (1,2,3), (1,4,5), (0,3,6)}

Final MST Cost: 16

7.2 Dry Run of Kruskal’s Algorithm

Step-by-step Execution:

Step Sorted Edges Selected Edge Connected Components MST Edges
1 (0,1,2), (1,2,3), (1,4,5), (0,3,6), (2,4,7), (1,3,8), (3,4,9) (0,1,2) {(0,1)}, {2}, {3}, {4} {(0,1,2)}
2 (1,2,3), (1,4,5), (0,3,6), (2,4,7), (1,3,8), (3,4,9) (1,2,3) {(0,1,2)}, {3}, {4} {(0,1,2), (1,2,3)}
3 (1,4,5), (0,3,6), (2,4,7), (1,3,8), (3,4,9) (1,4,5) {(0,1,2,4)}, {3} {(0,1,2), (1,2,3), (1,4,5)}
4 (0,3,6), (2,4,7), (1,3,8), (3,4,9) (0,3,6) {(0,1,2,3,4)} {(0,1,2), (1,2,3), (1,4,5), (0,3,6)}

Final MST Cost: 16

8. Time & Space Complexity Analysis

8.1 Prim’s Algorithm Complexity Analysis

  • Worst Case: \(O(E \log V)\), where \(E\) is the number of edges and \(V\) is the number of vertices.
  • Best Case: \(O(V)\), when the graph has only a few edges (tree-like structure).
  • Average Case: \(O(E \log V)\), as in most real-world scenarios, the algorithm processes multiple edges.

Derivation:

  • Each edge is added to the priority queue at most once: \(O(E)\).
  • Priority queue operations (insertion and deletion) take \(O(\log V)\) time.
  • Thus, total time complexity: \(O(E \log V)\).

8.2 Kruskal’s Algorithm Complexity Analysis

  • Worst Case: \(O(E \log E)\), as sorting the edges dominates the complexity.
  • Best Case: \(O(E)\), if edges are already sorted and there are minimal edges.
  • Average Case: \(O(E \log E)\), since sorting is generally required.

Derivation:

  • Sorting edges: \(O(E \log E)\).
  • Union-Find operations take \(O(\alpha(V))\), where \(\alpha(V)\) is the inverse Ackermann function, which grows very slowly.
  • Final complexity: \(O(E \log E)\).

9. Space Complexity Analysis

9.1 Prim’s Algorithm Space Complexity

  • Auxiliary space: \(O(V + E)\), due to the adjacency list representation.
  • Priority Queue: Stores at most \(V\) elements → \(O(V)\).
  • Visited array: \(O(V)\).
  • Total: \(O(V + E)\).

9.2 Kruskal’s Algorithm Space Complexity

  • Edge List Storage: \(O(E)\).
  • Disjoint Set Storage: \(O(V)\).
  • Total: \(O(V + E)\).

10. Trade-offs: When to Use Prim’s vs. Kruskal’s?

10.1 Comparison Based on Graph Type

Factor Prim’s Algorithm Kruskal’s Algorithm
Best for Dense graphs (many edges). Sparse graphs (fewer edges).
Implementation Complexity Uses priority queue (heap-based, more complex). Uses sorting + union-find (simpler to implement).
Time Complexity \(O(E \log V)\) \(O(E \log E)\)
Space Complexity \(O(V + E)\) \(O(V + E)\)
Performance on Large Graphs Better when many edges exist. Better when fewer edges exist.

10.2 Key Trade-offs

  • Prim’s Algorithm:
    • Performs well on dense graphs due to its greedy nature.
    • More efficient when the graph is represented as an adjacency matrix.
    • Uses a priority queue, making it slightly harder to implement.
  • Kruskal’s Algorithm:
    • Best for sparse graphs since it sorts edges first.
    • Uses disjoint set union (Union-Find), making it simpler to implement.
    • May be slower than Prim’s when the graph is dense.

11. Optimizations & Variants

11.1 Optimizations in Prim’s Algorithm

Prim’s algorithm can be optimized using various data structures:

  • Binary Heap (Priority Queue) Optimization:
    • Using a binary heap reduces the complexity from \(O(V^2)\) (naïve approach) to \(O(E \log V)\).
  • Fibonacci Heap Optimization:
    • Using a Fibonacci heap reduces the time complexity to \(O(V + E \log V)\).
    • Useful for dense graphs with many edges.
  • Lazy vs. Eager Prim’s Algorithm:
    • Lazy Prim’s: Stores all edges in a priority queue; slower but simpler.
    • Eager Prim’s: Updates only necessary edges in the priority queue; faster.

11.2 Optimizations in Kruskal’s Algorithm

Kruskal’s algorithm can be improved with:

  • Union by Rank & Path Compression:
    • Optimized disjoint-set data structure reduces complexity to \(O(E \log V)\).
  • Using Radix Sort Instead of Quick Sort:
    • If edge weights are small integers, sorting in \(O(E)\) instead of \(O(E \log E)\).

11.3 Variants of MST Algorithms

  • Reverse-Delete Algorithm:
    • Starts with all edges, removes the heaviest while ensuring connectivity.
    • Less efficient than Prim’s/Kruskal’s (\(O(E \log V)\)).
  • Borůvka’s Algorithm:
    • Parallelizable version of MST, repeatedly adding the smallest outgoing edge from each component.
    • Runs in \(O(E \log V)\).
  • Approximate MST for Dynamic Graphs:
    • Maintains MST under edge insertions and deletions.
    • Used in network maintenance applications.

12. Iterative vs. Recursive Implementations: Efficiency Comparison

12.1 Iterative Prim’s Algorithm

Prim’s algorithm is usually implemented iteratively using a priority queue.


import heapq

def prims_iterative(graph):
    n = len(graph)
    visited = [False] * n
    pq = [(0, 0)]
    mst_cost = 0

    while pq:
        cost, u = heapq.heappop(pq)
        if visited[u]:
            continue
        visited[u] = True
        mst_cost += cost

        for v, weight in graph[u]:
            if not visited[v]:
                heapq.heappush(pq, (weight, v))

    return mst_cost

12.2 Recursive Prim’s Algorithm

Prim’s algorithm can be written recursively, but it is not commonly done due to deep recursion limits.


def prims_recursive(graph, visited, pq):
    if not pq:
        return 0
    cost, u = heapq.heappop(pq)
    if visited[u]:
        return prims_recursive(graph, visited, pq)
    visited[u] = True
    for v, weight in graph[u]:
        if not visited[v]:
            heapq.heappush(pq, (weight, v))
    return cost + prims_recursive(graph, visited, pq)

def prims_driver(graph):
    n = len(graph)
    visited = [False] * n
    pq = [(0, 0)]
    return prims_recursive(graph, visited, pq)

12.3 Iterative vs. Recursive: Trade-offs

Factor Iterative Prim’s Recursive Prim’s
Time Complexity \(O(E \log V)\) \(O(E \log V)\)
Space Complexity \(O(V + E)\) \(O(V + E) + O(V)\) (stack space)
Performance More efficient in real-world use. Limited by recursion depth.
Ease of Implementation More common and practical. Less practical; recursion adds overhead.

Conclusion: Iterative Prim’s is preferred due to controlled space usage and better performance.

12.4 Iterative vs. Recursive Kruskal’s Algorithm

Iterative Kruskal’s Algorithm

Kruskal’s algorithm is inherently iterative, making recursion unnecessary.


class DisjointSet:
    def __init__(self, n):
        self.parent = list(range(n))

    def find(self, u):
        if self.parent[u] != u:
            self.parent[u] = self.find(self.parent[u])  # Path compression
        return self.parent[u]

    def union(self, u, v):
        root1, root2 = self.find(u), self.find(v)
        if root1 != root2:
            self.parent[root1] = root2

def kruskals_iterative(edges, n):
    edges.sort(key=lambda x: x[2])  # Sort edges by weight
    ds = DisjointSet(n)
    mst_cost = 0
    mst_edges = []

    for u, v, weight in edges:
        if ds.find(u) != ds.find(v):
            ds.union(u, v)
            mst_cost += weight
            mst_edges.append((u, v, weight))
            if len(mst_edges) == n - 1:
                break

    return mst_cost, mst_edges
Recursive Kruskal’s Algorithm

Kruskal’s algorithm can be made recursive but adds unnecessary complexity.


def kruskals_recursive(edges, ds, mst_edges, index, mst_cost, n):
    if len(mst_edges) == n - 1 or index >= len(edges):
        return mst_cost, mst_edges

    u, v, weight = edges[index]
    if ds.find(u) != ds.find(v):
        ds.union(u, v)
        mst_edges.append((u, v, weight))
        mst_cost += weight

    return kruskals_recursive(edges, ds, mst_edges, index + 1, mst_cost, n)

def kruskals_driver(edges, n):
    edges.sort(key=lambda x: x[2])
    ds = DisjointSet(n)
    return kruskals_recursive(edges, ds, [], 0, 0, n)

12.5 Iterative vs. Recursive Kruskal’s: Trade-offs

Factor Iterative Kruskal’s Recursive Kruskal’s
Time Complexity \(O(E \log E)\) \(O(E \log E)\)
Space Complexity \(O(V + E)\) \(O(V + E) + O(E)\) (stack space)
Performance Optimal and efficient. Recursion depth limits efficiency.
Practicality Standard implementation. Not commonly used.

Conclusion: Iterative Kruskal’s is the preferred method.

13. Edge Cases & Failure Handling

13.1 Common Edge Cases

  • Disconnected Graph: MST cannot be formed if the graph is not fully connected.
  • Graph with Only One Node: MST cost should be zero as there are no edges.
  • Graph with All Equal Edge Weights: Both Prim’s and Kruskal’s may produce multiple valid MSTs.
  • Graph with Negative Weights: MST algorithms work, but behavior depends on implementation.
  • Graph with Multiple Components: If a graph has multiple components, MST cannot be formed.
  • Graph with Large Edge Weights: Algorithms should handle large numbers without integer overflow.
  • Dense Graphs: Kruskal’s may be inefficient due to sorting, while Prim’s is better with adjacency lists.

13.2 Handling Failures in Implementation

  • Invalid Inputs: Handle cases where the input graph is empty or malformed.
  • Edge Cases in Data Structures: Ensure priority queue (Prim’s) and disjoint set (Kruskal’s) operations are correct.
  • Infinite Loops: Avoid infinite loops due to incorrect visited arrays or disjoint-set operations.

14. Test Cases to Verify Correctness

14.1 Sample Test Cases


def test_mst_algorithms():
    # Test Case 1: Small Graph
    graph = {
        0: [(1, 1), (2, 3)],
        1: [(0, 1), (2, 2)],
        2: [(0, 3), (1, 2)]
    }
    edges = [(0, 1, 1), (1, 2, 2), (0, 2, 3)]
    assert prims_mst(graph)[0] == 3  # MST Cost
    assert kruskals_mst(edges, 3)[0] == 3

    # Test Case 2: Disconnected Graph
    graph = {
        0: [(1, 2)],
        1: [(0, 2)],
        2: []
    }
    edges = [(0, 1, 2)]
    assert prims_mst(graph)[0] != float('inf')  # Should not fail
    assert kruskals_mst(edges, 3)[0] != float('inf')

    # Test Case 3: Single Node Graph
    graph = {0: []}
    edges = []
    assert prims_mst(graph)[0] == 0
    assert kruskals_mst(edges, 1)[0] == 0

    # Test Case 4: Graph with Large Weights
    graph = {
        0: [(1, 100000), (2, 200000)],
        1: [(0, 100000), (2, 300000)],
        2: [(0, 200000), (1, 300000)]
    }
    edges = [(0, 1, 100000), (1, 2, 300000), (0, 2, 200000)]
    assert prims_mst(graph)[0] == 300000
    assert kruskals_mst(edges, 3)[0] == 300000

    # Test Case 5: Large Graph Stress Test
    import random
    large_graph = {i: [(j, random.randint(1, 100)) for j in range(i + 1, 1000)] for i in range(1000)}
    large_edges = [(i, j, random.randint(1, 100)) for i in range(1000) for j in range(i + 1, 1000)]
    assert prims_mst(large_graph)[0] >= 0
    assert kruskals_mst(large_edges, 1000)[0] >= 0

test_mst_algorithms()
print("All tests passed.")

15. Real-World Failure Scenarios

15.1 Examples of MST Failures

  • Network Outages: If MST is used in network design, a single edge failure may disconnect nodes.
  • Incorrect Graph Representation: Using an adjacency matrix for Kruskal’s in large graphs increases memory overhead.
  • Floating-Point Precision Errors: Large weight differences can cause precision errors in some implementations.
  • Graph Updates in Dynamic Networks: MST does not dynamically update when new edges/nodes are added.
  • Security Concerns: In routing protocols (e.g., spanning tree protocol in networking), MST can be manipulated by attackers.

15.2 Mitigating Failures

  • Redundant Paths: Introduce alternate routes in network topology to prevent single-point failures.
  • Efficient Graph Representation: Use adjacency lists for Prim’s and edge lists for Kruskal’s.
  • Precision Handling: Normalize large values or use appropriate data types.
  • Dynamic MST Updates: Use incremental algorithms for updating MSTs in dynamic networks.
  • Security Measures: Prevent unauthorized modifications in network topology by verifying routing updates.

16. Real-World Applications & Industry Use Cases

16.1 Network Design & Infrastructure

  • Telecommunication Networks: MST is used to lay down optical fiber networks with minimum cost.
  • Electric Grid Design: Power distribution networks minimize the total wiring cost using MST.
  • Computer Networks: The Spanning Tree Protocol (STP) prevents loops in Ethernet networks.

16.2 Transportation & Logistics

  • Road & Railway Planning: MST helps in designing the most efficient way to connect cities.
  • Pipeline Construction: Used in building minimum-cost oil, water, and gas pipelines.
  • Flight Route Optimization: Airlines use MST to minimize the total cost of routes while ensuring full connectivity.

16.3 Data Science & AI

  • Cluster Analysis: MST helps in hierarchical clustering of large datasets.
  • Image Segmentation: MST is used in edge detection and object segmentation.
  • Recommendation Systems: Helps in reducing complexity when clustering similar users.

16.4 Cybersecurity & Network Management

  • Intrusion Detection Systems (IDS): MST helps in mapping network traffic to detect anomalies.
  • Network Optimization: Used in monitoring and reducing latency in data transfer.

17. Open-Source Implementations

17.1 Popular Libraries Implementing MST

  • NetworkX (Python): A popular library for graph algorithms, including MST.
  • Boost Graph Library (C++): Provides highly optimized MST algorithms.
  • Graph-tool (Python): A performance-oriented graph library with MST support.

17.2 Example: Using NetworkX for MST


import networkx as nx

G = nx.Graph()
edges = [(0, 1, 2), (0, 3, 6), (1, 2, 3), (1, 3, 8), (1, 4, 5), (2, 4, 7)]
G.add_weighted_edges_from(edges)

mst = nx.minimum_spanning_tree(G)
print("MST Edges:", list(mst.edges(data=True)))

18. Project: MST-Based Optimal Road Network Planner

18.1 Project Overview

This project simulates an optimal road network planner using MST to minimize the total cost of road construction between cities.

18.2 Implementation (Python Script)


import networkx as nx
import matplotlib.pyplot as plt

# Define city connections with distances
city_edges = [
    ("A", "B", 4), ("A", "C", 8), ("B", "C", 2),
    ("B", "D", 6), ("C", "D", 3), ("C", "E", 5),
    ("D", "E", 7), ("E", "F", 6)
]

# Create a weighted graph
G = nx.Graph()
G.add_weighted_edges_from(city_edges)

# Compute MST using Kruskal’s algorithm
mst = nx.minimum_spanning_tree(G)

# Plot the MST
plt.figure(figsize=(8, 6))
pos = nx.spring_layout(G)
nx.draw(G, pos, with_labels=True, node_color='lightblue', edge_color='gray', node_size=3000, font_size=12)
nx.draw_networkx_edges(mst, pos, edge_color='red', width=2)
plt.title("Minimum Spanning Tree for Road Network")
plt.show()

Usage: Run the script, and it will visualize the optimal road network using MST.

19. Competitive Programming & System Design Integration

19.1 MST in Competitive Programming

Minimum Spanning Tree (MST) is frequently used in competitive programming contests. Problems often involve:

  • Finding the MST for a weighted graph (direct application of Prim’s/Kruskal’s).
  • Handling dynamic graphs (edges being added/removed, requiring MST updates).
  • Finding the second-best MST (if one edge is removed, what’s the next best solution?).
  • MST with constraints (e.g., certain edges must be included or excluded).
Popular Platforms for MST Problems

19.2 MST in System Design

In real-world system design, MST is used to optimize infrastructure networks. Some examples:

  • Data Center Network Design: MST ensures minimal-cost connectivity between servers.
  • Distributed Computing: MST is used in message-passing algorithms to minimize network traffic.
  • Blockchain & Consensus Protocols: MST is useful in reducing redundant connections in distributed ledgers.
  • Cloud Computing: Optimal virtual network creation within cloud clusters.

20. Assignments

20.1 Solve at Least 10 MST Problems

Practice solving MST problems using both Prim’s and Kruskal’s algorithms.

  1. Basic MST Construction (Find the MST of a given graph).
  2. Find the Second-Best MST (If one edge is removed, what’s the next best MST?).
  3. MST with Specific Constraints (Some edges must/must not be included).
  4. Find the Maximum Weight Edge in MST (Useful for highway construction problems).
  5. Dynamic MST (Efficiently update MST as new edges are added).
  6. Reverse Delete Algorithm (Build MST using a different approach).
  7. Graph with Negative Weights (Check if MST algorithms work correctly).
  8. Handling Disconnected Graphs (Detect when MST is not possible).
  9. MST in Large Graphs (Test efficiency on 10,000+ nodes).
  10. Application of MST in Real-World Data (Use MST on a real dataset like city roads or social networks).

20.2 Use MST in a System Design Problem

Design a system where MST is a core part of the architecture.

Example: Optimal Cloud Service Connectivity

Given multiple data centers, determine the cheapest way to connect them with high-speed links using MST.

  • Input: A list of data centers and connection costs.
  • Output: The minimal-cost network ensuring all data centers are connected.
  • Implementation: Use Prim’s or Kruskal’s Algorithm.

20.3 Implement MST Under Time Constraints

Simulate a coding contest:

  • Set a timer for 30 minutes.
  • Implement Prim’s and Kruskal’s algorithms from scratch without looking at references.
  • Optimize your solution to handle large inputs (10⁵ nodes).

Measure your accuracy and speed, then analyze where you need improvements.