Floyd-Warshall - All-Pairs Dynamic Programming

The Core Problem

While Dijkstra’s Algorithm - The Greedy Invariant finds shortest paths from one source, Floyd-Warshall solves a different problem: find shortest paths between all pairs of vertices.

You could run Dijkstra V times (once per source), but Floyd-Warshall offers a cleaner, more elegant solution through dynamic programming.

The Dynamic Programming State

Define: dist[i][j][k] = shortest path from i to j using only vertices {0, 1, …, k} as intermediates

Base case: dist[i][j][0] = weight of edge (i,j) if it exists, else ∞

Recurrence:

dist[i][j][k] = min(
    dist[i][j][k-1],           # Don't use vertex k
    dist[i][k][k-1] + dist[k][j][k-1]  # Use vertex k
)

This asks: “Is going through vertex k better than the best path without k?”

Space Optimization

We can eliminate the third dimension by observing that we only need the previous k value:

def floyd_warshall(graph):
    # Initialize distances
    V = len(graph)
    dist = [[float('inf')] * V for _ in range(V)]
    
    # Base case: direct edges
    for i in range(V):
        dist[i][i] = 0
        for j, weight in graph[i].items():
            dist[i][j] = weight
    
    # DP: try each vertex as intermediate
    for k in range(V):
        for i in range(V):
            for j in range(V):
                dist[i][j] = min(dist[i][j], 
                                dist[i][k] + dist[k][j])
    
    return dist

The space optimization works because we can overwrite dist[i][j] in place - the order of updates doesn’t matter for correctness.

Why This Ordering Works

The triple loop order is critical: k must be the outer loop.

Why? When processing intermediate vertex k:

  • We need dist[i][k] and dist[k][j]
  • These can only use vertices {0, …, k-1} as intermediates
  • By making k outer, we ensure this property holds

If we made i or j outer instead:

# WRONG: i outer
for i in range(V):
    for k in range(V):
        for j in range(V):
            dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j])

This breaks because dist[i][k] might already include k as an intermediate!

Detecting Negative Cycles

After the algorithm completes, check the diagonal:

# After Floyd-Warshall
for i in range(V):
    if dist[i][i] < 0:
        return "Negative cycle exists"

If any vertex has a negative path to itself, there’s a negative cycle.

Path Reconstruction

Track predecessors during updates:

def floyd_warshall_with_paths(graph):
    V = len(graph)
    dist = [[float('inf')] * V for _ in range(V)]
    next_node = [[None] * V for _ in range(V)]
    
    # Initialize
    for i in range(V):
        dist[i][i] = 0
        for j, weight in graph[i].items():
            dist[i][j] = weight
            next_node[i][j] = j
    
    # DP
    for k in range(V):
        for i in range(V):
            for j in range(V):
                if dist[i][k] + dist[k][j] < dist[i][j]:
                    dist[i][j] = dist[i][k] + dist[k][j]
                    next_node[i][j] = next_node[i][k]
    
    return dist, next_node
 
def reconstruct_path(next_node, i, j):
    if next_node[i][j] is None:
        return []
    path = [i]
    while i != j:
        i = next_node[i][j]
        path.append(i)
    return path

The Complexity Trade-off

AlgorithmTimeSpaceUse Case
Dijkstra × VO(V(V+E) log V)O(V)Sparse, no negative weights
Bellman-Ford × VO(V²E)O(V)Any, with negative weights
Floyd-WarshallO(V³)O(V²)Dense graphs, all pairs needed

Floyd-Warshall shines when:

  1. Graph is dense (E ≈ V²)
  2. Need all pairs (not just from one source)
  3. Simplicity matters (remarkably simple code)

Matrix Interpretation

Floyd-Warshall is essentially matrix multiplication in the (min, +) semiring:

# Standard matrix multiplication: C = A × B
C[i][j] = sum(A[i][k] * B[k][j] for k in range(n))
 
# Floyd-Warshall: "multiply" in (min, +) semiring
dist[i][j] = min(dist[i][k] + dist[k][j] for k in range(V))

This connection reveals why it’s O(V³) - it’s like computing V matrix multiplications.

Parallelization Potential

The innermost two loops can be parallelized within each k:

for k in range(V):  # Must be sequential
    parallel_for i in range(V):  # Can parallelize
        parallel_for j in range(V):  # Can parallelize
            dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j])

Each k iteration creates a synchronization point, but within it, all updates are independent.

Relationship to Transitive Closure

Floyd-Warshall can compute reachability (transitive closure) by using Boolean operations:

def transitive_closure(graph):
    V = len(graph)
    reach = [[False] * V for _ in range(V)]
    
    # Initialize with direct edges
    for i in range(V):
        reach[i][i] = True
        for j in graph[i]:
            reach[i][j] = True
    
    # Floyd-Warshall style propagation
    for k in range(V):
        for i in range(V):
            for j in range(V):
                reach[i][j] = reach[i][j] or (reach[i][k] and reach[k][j])
    
    return reach

This is the (or, and) semiring instead of (min, +).

Common Implementation Errors

  1. Wrong loop order → k must be outermost
  2. Forgetting diagonal initialization → dist[i][i] = 0
  3. Not handling missing edges → Must use ∞, not 0
  4. Checking for negative cycles before completion → Only valid after all k

When to Use Floyd-Warshall

Use Floyd-Warshall when:

  1. Need all-pairs shortest paths
  2. Graph is dense (E = Ω(V²))
  3. Negative weights might exist (but no negative cycles)
  4. Simple implementation is valued

Don’t use when:

  1. Sparse graph (E << V²) → Dijkstra × V is faster
  2. Only need single-source → Use Dijkstra’s Algorithm - The Greedy Invariant
  3. Graph is huge (V > 1000) → O(V³) becomes prohibitive

Reconstruction: The DP recurrence follows from optimal substructure - either k is on the path or it isn’t. The k-outer loop ordering ensures we build paths incrementally using increasing sets of intermediate vertices.