Kruskal’s Algorithm - Union-Find Integration Pattern
The Bottleneck Problem
Kruskal’s core operation is: “Would adding edge (u,v) create a cycle?”
Naive approaches:
- BFS/DFS from u: Check if v is reachable → O(V+E) per edge → O(E²) total ❌
- Maintain adjacency list: Check connectivity → O(E) per edge → O(E²) total ❌
These make Kruskal too slow! We need something better.
Union-Find to the Rescue
Union Find solves this with two operations:
find(x): Which component/set is x in? → O(α(V)) amortizedunion(x,y): Merge x’s component with y’s component → O(α(V)) amortized
Where α(V) is the inverse Ackermann function, effectively constant for all practical V.
Key insight: u and v are in the same component ⟺ adding (u,v) creates a cycle!
The Integration Pattern
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
self.components = n
def find(self, x):
"""Find with path compression"""
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x, y):
"""Union by rank, returns True if union happened"""
px, py = self.find(x), self.find(y)
if px == py:
return False # Already connected - would create cycle!
# Union by rank
if self.rank[px] < self.rank[py]:
self.parent[px] = py
elif self.rank[px] > self.rank[py]:
self.parent[py] = px
else:
self.parent[py] = px
self.rank[px] += 1
self.components -= 1
return True
def kruskal(edges, n):
uf = UnionFind(n)
mst = []
for weight, u, v in sorted(edges):
if uf.union(u, v): # If not already connected
mst.append((weight, u, v))
if len(mst) == n - 1: # Early termination
break
return mst, uf.componentsWhy Path Compression Matters
Without path compression, find is O(log V) in the worst case (tree height).
Before path compression:
4
/
3
/
2
/
1
find(1) traverses 3 levels
After path compression:
4
/|\
3 2 1
find(1) now takes 1 step, and so does find(2) and find(3)!
Implementation detail:
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # Key line!
return self.parent[x]Every node on the path now points directly to root.
Why Union by Rank Matters
Without union by rank, we might create tall trees:
# BAD: No rank consideration
def union(x, y):
px, py = self.find(x), self.find(y)
if px != py:
self.parent[px] = py # Always make py the parentThis can create O(V) height trees in adversarial cases.
With union by rank, we attach shorter tree to taller tree’s root:
if self.rank[px] < self.rank[py]:
self.parent[px] = py # Make taller tree the parent
elif self.rank[px] > self.rank[py]:
self.parent[py] = px
else:
self.parent[py] = px
self.rank[px] += 1 # Both same height, new height = old + 1This guarantees tree height ≤ log V, but with path compression, we get O(α(V))!
The Amortized Analysis Intuition
Why is it nearly constant time?
Key observation: Path compression flattens trees as we go.
- First
find(x)might take O(log V) steps - But it compresses the path, so subsequent
finds are O(1) - Over many operations, the average cost becomes O(α(V))
The inverse Ackermann function α(V) < 5 for all practical V:
- α(2^65536) < 5
- It grows unbelievably slowly
Alternative Implementations
Union by Size
Track tree size instead of rank:
def __init__(self, n):
self.parent = list(range(n))
self.size = [1] * n
def union(self, x, y):
px, py = self.find(x), self.find(y)
if px == py:
return False
# Attach smaller to larger
if self.size[px] < self.size[py]:
self.parent[px] = py
self.size[py] += self.size[px]
else:
self.parent[py] = px
self.size[px] += self.size[py]
return TrueThis also gives O(log V) tree height and O(α(V)) amortized with path compression.
Without Path Compression
Still correct but slower - O(log V) per operation:
def find(self, x):
while self.parent[x] != x:
x = self.parent[x]
return xTracking Number of Components
Union-Find naturally tracks component count:
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
self.components = n # Initially n separate components
def union(self, x, y):
px, py = self.find(x), self.find(y)
if px == py:
return False
# ... merge ...
self.components -= 1 # One less component after merge
return TrueUseful for:
- Detecting when graph becomes connected (components == 1)
- Early termination in Kruskal (when edges == V - 1)
- Finding minimum spanning forest size
Edge Cases and Testing
def test_union_find():
# Test 1: Cycle detection
uf = UnionFind(3)
assert uf.union(0, 1) == True # Connect 0-1
assert uf.union(1, 2) == True # Connect 1-2 (now 0-1-2 chain)
assert uf.union(0, 2) == False # Would create cycle!
# Test 2: Component counting
uf = UnionFind(5)
assert uf.components == 5
uf.union(0, 1)
assert uf.components == 4
uf.union(2, 3)
assert uf.components == 3
uf.union(0, 2) # Merge {0,1} with {2,3}
assert uf.components == 2 # {0,1,2,3} and {4}
# Test 3: Path compression verification
uf = UnionFind(4)
uf.union(0, 1)
uf.union(1, 2)
uf.union(2, 3)
# After first find(0), path compresses
root = uf.find(0)
assert uf.parent[0] == root # Now points directly to rootPerformance Comparison
| Implementation | Find | Union | Total for Kruskal |
|---|---|---|---|
| BFS/DFS cycle check | O(V) | O(V) | O(E²) |
| Union-Find (naive) | O(V) | O(V) | O(EV) |
| Union by rank | O(log V) | O(log V) | O(E log V) |
| Path compression + rank | O(α(V)) | O(α(V)) | O(E log E) |
The last row shows why Union-Find is essential - it makes Kruskal practical!
Common Implementation Mistakes
- Forgetting path compression: Lose the O(α(V)) guarantee
- Not checking union return value: Miss cycle detection
- Wrong rank update: Only increase rank when ranks are equal
- Modifying during find: Path compression must update
self.parent, not a local copy - Off-by-one in initialization: Must be
range(n)notrange(n-1)
The Perfect Match
Kruskal + Union-Find is a textbook example of algorithm-data structure synergy:
Kruskal needs:
- Fast cycle detection
- Maintain connected components
- Process edges in any order
Union-Find provides:
- O(α(V)) component queries
- O(α(V)) component merging
- Order-independent operations
Neither is as elegant without the other. Kruskal with BFS/DFS is too slow. Union-Find alone is a solution looking for a problem. Together, they’re optimal.
Beyond Kruskal: Other Union-Find Applications
- Network connectivity: Dynamic connectivity queries
- Image segmentation: Group similar pixels (like Kruskal clusters)
- Least common ancestor: With path compression
- Percolation: Does a path exist from top to bottom?
- Kruskal variants: Reverse delete, maximum spanning tree
Key Takeaway
Union-Find isn’t just an optimization for Kruskal - it’s what makes Kruskal tractable. The O(E log E) sorting cost dominates only because Union-Find makes the cycle checking O(Eα(V)) ≈ O(E). This is a recurring pattern in algorithms: the right data structure transforms a polynomial algorithm into a near-linear one.
Related Concepts
- Kruskal’s Algorithm - Greedy Edge Selection for MST
- Union Find
- Graph Algorithms Optimal Complexity
- Dijkstra’s Algorithm - Implementation Patterns (another data structure synergy)
Reconstruction: Union by rank keeps tree height logarithmic. Path compression flattens trees during traversal. Together, they give O(α(V)) amortized time, where α is the inverse Ackermann function (effectively constant). This makes Kruskal’s O(E log E) dominated by sorting, not cycle checks.