Kruskal’s Algorithm - Greedy Edge Selection for MST

The Minimum Spanning Tree Problem

Given a connected, weighted graph, find a tree that:

Spans all vertices (connects everything)
Minimizes total edge weight
Is acyclic (it’s a tree)

A spanning tree uses exactly V-1 edges (for V vertices). The question is: which V-1 edges?

Kruskal’s Greedy Strategy

The algorithm is remarkably simple:

1. Sort all edges by weight (ascending)
2. Initialize empty tree T
3. For each edge (u,v) in sorted order:
   - If adding (u,v) to T doesn't create a cycle:
     - Add (u,v) to T
4. Return T when we have V-1 edges

Why Greedy Works Here

The Cut Property: For any cut (partition of vertices into two sets), the minimum weight edge crossing the cut is in some MST.

Proof sketch:

Suppose edge e is the minimum crossing the cut
If e is not in MST T, then T has some other edge e’ crossing the cut
Since e has smaller weight, we can swap e’ for e
This creates a tree T’ with smaller total weight
Contradiction! T wasn’t minimal.

Kruskal exploits this: Each time we add edge (u,v), it’s the minimum edge connecting u’s component to v’s component (a cut!).

The Cycle Detection Problem

The key operation: “Does adding (u,v) create a cycle?”

Equivalently: “Are u and v already connected?”

This is exactly what Union Find solves efficiently!

Simple Implementation

def kruskal(edges, num_vertices):
    """
    edges: list of (weight, u, v) tuples
    returns: list of edges in MST
    """
    # Sort edges by weight
    edges.sort()
    
    # Union-Find for cycle detection
    parent = list(range(num_vertices))
    rank = [0] * num_vertices
    
    def find(x):
        if parent[x] != x:
            parent[x] = find(parent[x])  # Path compression
        return parent[x]
    
    def union(x, y):
        px, py = find(x), find(y)
        if px == py:
            return False  # Already connected (would create cycle)
        # Union by rank
        if rank[px] < rank[py]:
            parent[px] = py
        elif rank[px] > rank[py]:
            parent[py] = px
        else:
            parent[py] = px
            rank[px] += 1
        return True
    
    mst = []
    for weight, u, v in edges:
        if union(u, v):  # If u and v not connected
            mst.append((weight, u, v))
            if len(mst) == num_vertices - 1:
                break  # Early termination
    
    return mst

Example Execution

Graph:
    A---1---B
    |       |
    4       2
    |       |
    C---3---D

Edges sorted: [(1,A,B), (2,B,D), (3,C,D), (4,A,C)]

Step 1: Add (1,A,B) - connects A and B
Step 2: Add (2,B,D) - connects {A,B} to D
Step 3: Add (3,C,D) - connects C to {A,B,D}
Step 4: Skip (4,A,C) - would create cycle (A and C already connected)

MST: {(1,A,B), (2,B,D), (3,C,D)} with total weight 6

Time Complexity Analysis

Operation	Complexity	Count
Sort edges	O(E log E)	Once
Union-Find operations	O(α(V)) amortized	O(E) times
Total	O(E log E)

Note: O(E log E) = O(E log V) since E ≤ V² implies log E ≤ 2 log V.

The sorting dominates! Union-Find is nearly constant time with path compression and union by rank.

When Kruskal Beats Prim

Kruskal is better when:

Graph is sparse (E << V²)
Edges are already sorted or cheap to sort
You have edge list representation
Simple implementation is valued

Prim is better when:

Graph is dense (E ≈ V²)
You have adjacency matrix or list
You want to avoid sorting all edges

Both produce an MST; the choice is about efficiency for your graph structure.

Correctness: The Exchange Argument

Claim: Kruskal’s greedy choice is always safe.

Proof by exchange: Let A be edges chosen so far (acyclic, minimal weight so far). Let (u,v) be the next edge Kruskal adds.

If (u,v) is not in some optimal MST T*, then:

Adding (u,v) to T* creates a cycle
This cycle must contain another edge (x,y) connecting u’s component to v’s component
Since we chose edges in order, weight(u,v) ≤ weight(x,y)
We can swap (x,y) for (u,v) to get MST with weight ≤ T*
Contradiction! T* wasn’t optimal.

Therefore, (u,v) is in some optimal MST.

Disconnected Graphs

If the graph is disconnected, Kruskal produces a minimum spanning forest - one MST per connected component.

def kruskal(edges, num_vertices):
    # ... same algorithm ...
    
    if len(mst) < num_vertices - 1:
        # Graph is disconnected
        num_components = num_vertices - len(mst)
        return mst, num_components
    return mst, 1  # Single component

Variants and Applications

Maximum Spanning Tree

Sort edges in descending order instead:

edges.sort(reverse=True)  # Largest first

Kruskal’s Reverse Delete Algorithm

Start with all edges, remove heaviest while maintaining connectivity (slower but interesting).

Real-world Applications

Network design: Minimize cable length to connect all buildings
Clustering: Remove k-1 heaviest edges to get k clusters
Approximation algorithms: MST is used in traveling salesman approximations
Image segmentation: Pixels as vertices, similarity as edge weights

The “Greedy on Global, Local Checks” Pattern

Kruskal exhibits a common algorithmic pattern:

Sort globally (all edges by weight)
Decide locally (check each edge independently)

This works because:

Sorting ensures we consider minimum weight edges first
Union-Find makes local decisions (cycle check) efficient
The cut property guarantees global optimality

Common Mistakes

Forgetting to sort edges - breaks the greedy guarantee
Using BFS/DFS for cycle detection - too slow (O(V) per edge instead of O(α(V)))
Not checking edge count - could fail to terminate for disconnected graphs
Wrong comparison in union by rank - should use rank, not tree size

Key Insight

Kruskal’s elegance is in its simplicity: sort edges, greedily add safe ones. The cut property guarantees optimality, and Union-Find makes it efficient. It’s a perfect example of how the right data structure (Union Find) enables an elegant algorithm.

Kruskal’s Algorithm - Union-Find Integration Pattern
Union Find
Dijkstra’s Algorithm - The Greedy Invariant (another greedy graph algorithm)
Graph Algorithms Optimal Complexity

Reconstruction: The cut property guarantees that the minimum crossing edge is always safe. Since Kruskal always picks the globally minimum unchecked edge, and Union-Find determines if it crosses a cut (connects components), the greedy choice is optimal.

Quartz 4

Explorer

Kruskal's Algorithm - Greedy Edge Selection for MST

Kruskal’s Algorithm - Greedy Edge Selection for MST

The Minimum Spanning Tree Problem

Kruskal’s Greedy Strategy

Why Greedy Works Here

The Cycle Detection Problem

Simple Implementation

Example Execution

Time Complexity Analysis

When Kruskal Beats Prim

Correctness: The Exchange Argument

Disconnected Graphs

Variants and Applications

Maximum Spanning Tree

Kruskal’s Reverse Delete Algorithm

Real-world Applications

The “Greedy on Global, Local Checks” Pattern

Common Mistakes

Key Insight

Graph View

Table of Contents

Backlinks

Quartz 4

Explorer

Kruskal's Algorithm - Greedy Edge Selection for MST

Kruskal’s Algorithm - Greedy Edge Selection for MST

The Minimum Spanning Tree Problem

Kruskal’s Greedy Strategy

Why Greedy Works Here

The Cycle Detection Problem

Simple Implementation

Example Execution

Time Complexity Analysis

When Kruskal Beats Prim

Correctness: The Exchange Argument

Disconnected Graphs

Variants and Applications

Maximum Spanning Tree

Kruskal’s Reverse Delete Algorithm

Real-world Applications

The “Greedy on Global, Local Checks” Pattern

Common Mistakes

Key Insight

Related Concepts

Graph View

Table of Contents

Backlinks