Union-Find (a.k.a. disjoint-set) maintains a partition of elements into equivalence classes, supporting two operations: find(x) returns a canonical representative for the set containing x, and union(x, y) merges the two sets. Both run in nearly O(1) amortized time, specifically O(α(n)). α is the Inverse Ackermann Function, which grows so slowly it’s effectively constant for any n you’ll meet. The structural trick that buys this is two complementary optimizations: path compression flattens the tree during find() so future lookups are short, and union-by-rank keeps trees balanced by attaching the smaller-rank tree under the larger.
Reach for it whenever the question is “are x and y in the same group, given a stream of unions?” Cycle detection in undirected graphs (e.g. inside Kruskal’s MST: an edge creates a cycle iff its endpoints already share a root), connected-components analysis, percolation problems, and tracking equivalence relations as they grow. Crucially: undirected only. Union is symmetric (union(a,b) == union(b,a)), so it cannot encode edge direction. For directed cycle detection, use DFS with a recursion stack instead.
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n # upper bound on tree height
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # path compression
return self.parent[x]
def union(self, x, y):
rx, ry = self.find(x), self.find(y)
if rx == ry:
return
if self.rank[rx] >= self.rank[ry]:
self.parent[ry] = rx
else:
self.parent[rx] = ry
if self.rank[rx] == self.rank[ry]:
self.rank[rx] += 1For non-integer keys, swap the parent/rank arrays for dicts indexed by node identity. An alternative balancing strategy is union-by-size (track element count per set instead of rank).