Kruskal’s Algorithm - Union-Find Integration Pattern

The Bottleneck Problem

Kruskal’s core operation is: “Would adding edge (u,v) create a cycle?”

Naive approaches:

  • BFS/DFS from u: Check if v is reachable → O(V+E) per edge → O(E²) total ❌
  • Maintain adjacency list: Check connectivity → O(E) per edge → O(E²) total ❌

These make Kruskal too slow! We need something better.

Union-Find to the Rescue

Union Find solves this with two operations:

  • find(x): Which component/set is x in? → O(α(V)) amortized
  • union(x,y): Merge x’s component with y’s component → O(α(V)) amortized

Where α(V) is the inverse Ackermann function, effectively constant for all practical V.

Key insight: u and v are in the same component ⟺ adding (u,v) creates a cycle!

The Integration Pattern

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
        self.components = n
    
    def find(self, x):
        """Find with path compression"""
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
    
    def union(self, x, y):
        """Union by rank, returns True if union happened"""
        px, py = self.find(x), self.find(y)
        
        if px == py:
            return False  # Already connected - would create cycle!
        
        # Union by rank
        if self.rank[px] < self.rank[py]:
            self.parent[px] = py
        elif self.rank[px] > self.rank[py]:
            self.parent[py] = px
        else:
            self.parent[py] = px
            self.rank[px] += 1
        
        self.components -= 1
        return True
 
def kruskal(edges, n):
    uf = UnionFind(n)
    mst = []
    
    for weight, u, v in sorted(edges):
        if uf.union(u, v):  # If not already connected
            mst.append((weight, u, v))
            if len(mst) == n - 1:  # Early termination
                break
    
    return mst, uf.components

Why Path Compression Matters

Without path compression, find is O(log V) in the worst case (tree height).

Before path compression:

    4
   /
  3
 /
2
/
1

find(1) traverses 3 levels

After path compression:

    4
   /|\
  3 2 1

find(1) now takes 1 step, and so does find(2) and find(3)!

Implementation detail:

def find(self, x):
    if self.parent[x] != x:
        self.parent[x] = self.find(self.parent[x])  # Key line!
    return self.parent[x]

Every node on the path now points directly to root.

Why Union by Rank Matters

Without union by rank, we might create tall trees:

# BAD: No rank consideration
def union(x, y):
    px, py = self.find(x), self.find(y)
    if px != py:
        self.parent[px] = py  # Always make py the parent

This can create O(V) height trees in adversarial cases.

With union by rank, we attach shorter tree to taller tree’s root:

if self.rank[px] < self.rank[py]:
    self.parent[px] = py  # Make taller tree the parent
elif self.rank[px] > self.rank[py]:
    self.parent[py] = px
else:
    self.parent[py] = px
    self.rank[px] += 1  # Both same height, new height = old + 1

This guarantees tree height ≤ log V, but with path compression, we get O(α(V))!

The Amortized Analysis Intuition

Why is it nearly constant time?

Key observation: Path compression flattens trees as we go.

  • First find(x) might take O(log V) steps
  • But it compresses the path, so subsequent finds are O(1)
  • Over many operations, the average cost becomes O(α(V))

The inverse Ackermann function α(V) < 5 for all practical V:

  • α(2^65536) < 5
  • It grows unbelievably slowly

Alternative Implementations

Union by Size

Track tree size instead of rank:

def __init__(self, n):
    self.parent = list(range(n))
    self.size = [1] * n
 
def union(self, x, y):
    px, py = self.find(x), self.find(y)
    if px == py:
        return False
    
    # Attach smaller to larger
    if self.size[px] < self.size[py]:
        self.parent[px] = py
        self.size[py] += self.size[px]
    else:
        self.parent[py] = px
        self.size[px] += self.size[py]
    return True

This also gives O(log V) tree height and O(α(V)) amortized with path compression.

Without Path Compression

Still correct but slower - O(log V) per operation:

def find(self, x):
    while self.parent[x] != x:
        x = self.parent[x]
    return x

Tracking Number of Components

Union-Find naturally tracks component count:

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
        self.components = n  # Initially n separate components
    
    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py:
            return False
        # ... merge ...
        self.components -= 1  # One less component after merge
        return True

Useful for:

  • Detecting when graph becomes connected (components == 1)
  • Early termination in Kruskal (when edges == V - 1)
  • Finding minimum spanning forest size

Edge Cases and Testing

def test_union_find():
    # Test 1: Cycle detection
    uf = UnionFind(3)
    assert uf.union(0, 1) == True   # Connect 0-1
    assert uf.union(1, 2) == True   # Connect 1-2 (now 0-1-2 chain)
    assert uf.union(0, 2) == False  # Would create cycle!
    
    # Test 2: Component counting
    uf = UnionFind(5)
    assert uf.components == 5
    uf.union(0, 1)
    assert uf.components == 4
    uf.union(2, 3)
    assert uf.components == 3
    uf.union(0, 2)  # Merge {0,1} with {2,3}
    assert uf.components == 2  # {0,1,2,3} and {4}
    
    # Test 3: Path compression verification
    uf = UnionFind(4)
    uf.union(0, 1)
    uf.union(1, 2)
    uf.union(2, 3)
    # After first find(0), path compresses
    root = uf.find(0)
    assert uf.parent[0] == root  # Now points directly to root

Performance Comparison

ImplementationFindUnionTotal for Kruskal
BFS/DFS cycle checkO(V)O(V)O(E²)
Union-Find (naive)O(V)O(V)O(EV)
Union by rankO(log V)O(log V)O(E log V)
Path compression + rankO(α(V))O(α(V))O(E log E)

The last row shows why Union-Find is essential - it makes Kruskal practical!

Common Implementation Mistakes

  1. Forgetting path compression: Lose the O(α(V)) guarantee
  2. Not checking union return value: Miss cycle detection
  3. Wrong rank update: Only increase rank when ranks are equal
  4. Modifying during find: Path compression must update self.parent, not a local copy
  5. Off-by-one in initialization: Must be range(n) not range(n-1)

The Perfect Match

Kruskal + Union-Find is a textbook example of algorithm-data structure synergy:

Kruskal needs:

  • Fast cycle detection
  • Maintain connected components
  • Process edges in any order

Union-Find provides:

  • O(α(V)) component queries
  • O(α(V)) component merging
  • Order-independent operations

Neither is as elegant without the other. Kruskal with BFS/DFS is too slow. Union-Find alone is a solution looking for a problem. Together, they’re optimal.

Beyond Kruskal: Other Union-Find Applications

  1. Network connectivity: Dynamic connectivity queries
  2. Image segmentation: Group similar pixels (like Kruskal clusters)
  3. Least common ancestor: With path compression
  4. Percolation: Does a path exist from top to bottom?
  5. Kruskal variants: Reverse delete, maximum spanning tree

Key Takeaway

Union-Find isn’t just an optimization for Kruskal - it’s what makes Kruskal tractable. The O(E log E) sorting cost dominates only because Union-Find makes the cycle checking O(Eα(V)) ≈ O(E). This is a recurring pattern in algorithms: the right data structure transforms a polynomial algorithm into a near-linear one.


Reconstruction: Union by rank keeps tree height logarithmic. Path compression flattens trees during traversal. Together, they give O(α(V)) amortized time, where α is the inverse Ackermann function (effectively constant). This makes Kruskal’s O(E log E) dominated by sorting, not cycle checks.