Kruskal’s Algorithm - Union-Find Integration Pattern

The Bottleneck Problem

Kruskal’s core operation is: “Would adding edge (u,v) create a cycle?”

Naive approaches:

BFS/DFS from u: Check if v is reachable → O(V+E) per edge → O(E²) total ❌
Maintain adjacency list: Check connectivity → O(E) per edge → O(E²) total ❌

These make Kruskal too slow! We need something better.

Union-Find to the Rescue

Union Find solves this with two operations:

find(x): Which component/set is x in? → O(α(V)) amortized
union(x,y): Merge x’s component with y’s component → O(α(V)) amortized

Where α(V) is the inverse Ackermann function, effectively constant for all practical V.

Key insight: u and v are in the same component ⟺ adding (u,v) creates a cycle!

The Integration Pattern

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
        self.components = n
    
    def find(self, x):
        """Find with path compression"""
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
    
    def union(self, x, y):
        """Union by rank, returns True if union happened"""
        px, py = self.find(x), self.find(y)
        
        if px == py:
            return False  # Already connected - would create cycle!
        
        # Union by rank
        if self.rank[px] < self.rank[py]:
            self.parent[px] = py
        elif self.rank[px] > self.rank[py]:
            self.parent[py] = px
        else:
            self.parent[py] = px
            self.rank[px] += 1
        
        self.components -= 1
        return True
 
def kruskal(edges, n):
    uf = UnionFind(n)
    mst = []
    
    for weight, u, v in sorted(edges):
        if uf.union(u, v):  # If not already connected
            mst.append((weight, u, v))
            if len(mst) == n - 1:  # Early termination
                break
    
    return mst, uf.components

Why Path Compression Matters

Without path compression, find is O(log V) in the worst case (tree height).

Before path compression:

find(1) traverses 3 levels

After path compression:

    4
   /|\
  3 2 1

find(1) now takes 1 step, and so does find(2) and find(3)!

Implementation detail:

def find(self, x):
    if self.parent[x] != x:
        self.parent[x] = self.find(self.parent[x])  # Key line!
    return self.parent[x]

Every node on the path now points directly to root.

Why Union by Rank Matters

Without union by rank, we might create tall trees:

# BAD: No rank consideration
def union(x, y):
    px, py = self.find(x), self.find(y)
    if px != py:
        self.parent[px] = py  # Always make py the parent

This can create O(V) height trees in adversarial cases.

With union by rank, we attach shorter tree to taller tree’s root:

if self.rank[px] < self.rank[py]:
    self.parent[px] = py  # Make taller tree the parent
elif self.rank[px] > self.rank[py]:
    self.parent[py] = px
else:
    self.parent[py] = px
    self.rank[px] += 1  # Both same height, new height = old + 1

This guarantees tree height ≤ log V, but with path compression, we get O(α(V))!

The Amortized Analysis Intuition

Why is it nearly constant time?

Key observation: Path compression flattens trees as we go.

First find(x) might take O(log V) steps
But it compresses the path, so subsequent finds are O(1)
Over many operations, the average cost becomes O(α(V))

The inverse Ackermann function α(V) < 5 for all practical V:

α(2^65536) < 5
It grows unbelievably slowly

Alternative Implementations

Union by Size

Track tree size instead of rank:

def __init__(self, n):
    self.parent = list(range(n))
    self.size = [1] * n
 
def union(self, x, y):
    px, py = self.find(x), self.find(y)
    if px == py:
        return False
    
    # Attach smaller to larger
    if self.size[px] < self.size[py]:
        self.parent[px] = py
        self.size[py] += self.size[px]
    else:
        self.parent[py] = px
        self.size[px] += self.size[py]
    return True

This also gives O(log V) tree height and O(α(V)) amortized with path compression.

Without Path Compression

Still correct but slower - O(log V) per operation:

def find(self, x):
    while self.parent[x] != x:
        x = self.parent[x]
    return x

Tracking Number of Components

Union-Find naturally tracks component count:

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
        self.components = n  # Initially n separate components
    
    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py:
            return False
        # ... merge ...
        self.components -= 1  # One less component after merge
        return True

Useful for:

Detecting when graph becomes connected (components == 1)
Early termination in Kruskal (when edges == V - 1)
Finding minimum spanning forest size

Edge Cases and Testing

def test_union_find():
    # Test 1: Cycle detection
    uf = UnionFind(3)
    assert uf.union(0, 1) == True   # Connect 0-1
    assert uf.union(1, 2) == True   # Connect 1-2 (now 0-1-2 chain)
    assert uf.union(0, 2) == False  # Would create cycle!
    
    # Test 2: Component counting
    uf = UnionFind(5)
    assert uf.components == 5
    uf.union(0, 1)
    assert uf.components == 4
    uf.union(2, 3)
    assert uf.components == 3
    uf.union(0, 2)  # Merge {0,1} with {2,3}
    assert uf.components == 2  # {0,1,2,3} and {4}
    
    # Test 3: Path compression verification
    uf = UnionFind(4)
    uf.union(0, 1)
    uf.union(1, 2)
    uf.union(2, 3)
    # After first find(0), path compresses
    root = uf.find(0)
    assert uf.parent[0] == root  # Now points directly to root

Performance Comparison

Implementation	Find	Union	Total for Kruskal
BFS/DFS cycle check	O(V)	O(V)	O(E²)
Union-Find (naive)	O(V)	O(V)	O(EV)
Union by rank	O(log V)	O(log V)	O(E log V)
Path compression + rank	O(α(V))	O(α(V))	O(E log E)

The last row shows why Union-Find is essential - it makes Kruskal practical!

Common Implementation Mistakes

Forgetting path compression: Lose the O(α(V)) guarantee
Not checking union return value: Miss cycle detection
Wrong rank update: Only increase rank when ranks are equal
Modifying during find: Path compression must update self.parent, not a local copy
Off-by-one in initialization: Must be range(n) not range(n-1)

The Perfect Match

Kruskal + Union-Find is a textbook example of algorithm-data structure synergy:

Kruskal needs:

Fast cycle detection
Maintain connected components
Process edges in any order

Union-Find provides:

O(α(V)) component queries
O(α(V)) component merging
Order-independent operations

Neither is as elegant without the other. Kruskal with BFS/DFS is too slow. Union-Find alone is a solution looking for a problem. Together, they’re optimal.

Beyond Kruskal: Other Union-Find Applications

Network connectivity: Dynamic connectivity queries
Image segmentation: Group similar pixels (like Kruskal clusters)
Least common ancestor: With path compression
Percolation: Does a path exist from top to bottom?
Kruskal variants: Reverse delete, maximum spanning tree

Key Takeaway

Union-Find isn’t just an optimization for Kruskal - it’s what makes Kruskal tractable. The O(E log E) sorting cost dominates only because Union-Find makes the cycle checking O(Eα(V)) ≈ O(E). This is a recurring pattern in algorithms: the right data structure transforms a polynomial algorithm into a near-linear one.

Kruskal’s Algorithm - Greedy Edge Selection for MST
Union Find
Graph Algorithms Optimal Complexity
Dijkstra’s Algorithm - Implementation Patterns (another data structure synergy)

Reconstruction: Union by rank keeps tree height logarithmic. Path compression flattens trees during traversal. Together, they give O(α(V)) amortized time, where α is the inverse Ackermann function (effectively constant). This makes Kruskal’s O(E log E) dominated by sorting, not cycle checks.

Quartz 4

Explorer

Kruskal's Algorithm - Union-Find Integration Pattern

Kruskal’s Algorithm - Union-Find Integration Pattern

The Bottleneck Problem

Union-Find to the Rescue

The Integration Pattern

Why Path Compression Matters

Why Union by Rank Matters

The Amortized Analysis Intuition

Alternative Implementations

Union by Size

Without Path Compression

Tracking Number of Components

Edge Cases and Testing

Performance Comparison

Common Implementation Mistakes

The Perfect Match

Beyond Kruskal: Other Union-Find Applications

Key Takeaway

Graph View

Table of Contents

Backlinks

Quartz 4

Explorer

Kruskal's Algorithm - Union-Find Integration Pattern

Kruskal’s Algorithm - Union-Find Integration Pattern

The Bottleneck Problem

Union-Find to the Rescue

The Integration Pattern

Why Path Compression Matters

Why Union by Rank Matters

The Amortized Analysis Intuition

Alternative Implementations

Union by Size

Without Path Compression

Tracking Number of Components

Edge Cases and Testing

Performance Comparison

Common Implementation Mistakes

The Perfect Match

Beyond Kruskal: Other Union-Find Applications

Key Takeaway

Related Concepts

Graph View

Table of Contents

Backlinks