Floyd-Warshall - All-Pairs Dynamic Programming
The Core Problem
While Dijkstra’s Algorithm - The Greedy Invariant finds shortest paths from one source, Floyd-Warshall solves a different problem: find shortest paths between all pairs of vertices.
You could run Dijkstra V times (once per source), but Floyd-Warshall offers a cleaner, more elegant solution through dynamic programming.
The Dynamic Programming State
Define: dist[i][j][k] = shortest path from i to j using only vertices {0, 1, …, k} as intermediates
Base case: dist[i][j][0] = weight of edge (i,j) if it exists, else ∞
Recurrence:
dist[i][j][k] = min(
dist[i][j][k-1], # Don't use vertex k
dist[i][k][k-1] + dist[k][j][k-1] # Use vertex k
)
This asks: “Is going through vertex k better than the best path without k?”
Space Optimization
We can eliminate the third dimension by observing that we only need the previous k value:
def floyd_warshall(graph):
# Initialize distances
V = len(graph)
dist = [[float('inf')] * V for _ in range(V)]
# Base case: direct edges
for i in range(V):
dist[i][i] = 0
for j, weight in graph[i].items():
dist[i][j] = weight
# DP: try each vertex as intermediate
for k in range(V):
for i in range(V):
for j in range(V):
dist[i][j] = min(dist[i][j],
dist[i][k] + dist[k][j])
return distThe space optimization works because we can overwrite dist[i][j] in place - the order of updates doesn’t matter for correctness.
Why This Ordering Works
The triple loop order is critical: k must be the outer loop.
Why? When processing intermediate vertex k:
- We need
dist[i][k]anddist[k][j] - These can only use vertices {0, …, k-1} as intermediates
- By making k outer, we ensure this property holds
If we made i or j outer instead:
# WRONG: i outer
for i in range(V):
for k in range(V):
for j in range(V):
dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j])This breaks because dist[i][k] might already include k as an intermediate!
Detecting Negative Cycles
After the algorithm completes, check the diagonal:
# After Floyd-Warshall
for i in range(V):
if dist[i][i] < 0:
return "Negative cycle exists"If any vertex has a negative path to itself, there’s a negative cycle.
Path Reconstruction
Track predecessors during updates:
def floyd_warshall_with_paths(graph):
V = len(graph)
dist = [[float('inf')] * V for _ in range(V)]
next_node = [[None] * V for _ in range(V)]
# Initialize
for i in range(V):
dist[i][i] = 0
for j, weight in graph[i].items():
dist[i][j] = weight
next_node[i][j] = j
# DP
for k in range(V):
for i in range(V):
for j in range(V):
if dist[i][k] + dist[k][j] < dist[i][j]:
dist[i][j] = dist[i][k] + dist[k][j]
next_node[i][j] = next_node[i][k]
return dist, next_node
def reconstruct_path(next_node, i, j):
if next_node[i][j] is None:
return []
path = [i]
while i != j:
i = next_node[i][j]
path.append(i)
return pathThe Complexity Trade-off
| Algorithm | Time | Space | Use Case |
|---|---|---|---|
| Dijkstra × V | O(V(V+E) log V) | O(V) | Sparse, no negative weights |
| Bellman-Ford × V | O(V²E) | O(V) | Any, with negative weights |
| Floyd-Warshall | O(V³) | O(V²) | Dense graphs, all pairs needed |
Floyd-Warshall shines when:
- Graph is dense (E ≈ V²)
- Need all pairs (not just from one source)
- Simplicity matters (remarkably simple code)
Matrix Interpretation
Floyd-Warshall is essentially matrix multiplication in the (min, +) semiring:
# Standard matrix multiplication: C = A × B
C[i][j] = sum(A[i][k] * B[k][j] for k in range(n))
# Floyd-Warshall: "multiply" in (min, +) semiring
dist[i][j] = min(dist[i][k] + dist[k][j] for k in range(V))This connection reveals why it’s O(V³) - it’s like computing V matrix multiplications.
Parallelization Potential
The innermost two loops can be parallelized within each k:
for k in range(V): # Must be sequential
parallel_for i in range(V): # Can parallelize
parallel_for j in range(V): # Can parallelize
dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j])Each k iteration creates a synchronization point, but within it, all updates are independent.
Relationship to Transitive Closure
Floyd-Warshall can compute reachability (transitive closure) by using Boolean operations:
def transitive_closure(graph):
V = len(graph)
reach = [[False] * V for _ in range(V)]
# Initialize with direct edges
for i in range(V):
reach[i][i] = True
for j in graph[i]:
reach[i][j] = True
# Floyd-Warshall style propagation
for k in range(V):
for i in range(V):
for j in range(V):
reach[i][j] = reach[i][j] or (reach[i][k] and reach[k][j])
return reachThis is the (or, and) semiring instead of (min, +).
Common Implementation Errors
- Wrong loop order → k must be outermost
- Forgetting diagonal initialization → dist[i][i] = 0
- Not handling missing edges → Must use ∞, not 0
- Checking for negative cycles before completion → Only valid after all k
When to Use Floyd-Warshall
Use Floyd-Warshall when:
- Need all-pairs shortest paths
- Graph is dense (E = Ω(V²))
- Negative weights might exist (but no negative cycles)
- Simple implementation is valued
Don’t use when:
- Sparse graph (E << V²) → Dijkstra × V is faster
- Only need single-source → Use Dijkstra’s Algorithm - The Greedy Invariant
- Graph is huge (V > 1000) → O(V³) becomes prohibitive
Related Concepts
- Floyd-Warshall - Why Intermediate Nodes Matter
- Shortest Path Relaxation Operation
- Dijkstra’s Algorithm - The Greedy Invariant
- Bellman-Ford Algorithm - Edge Relaxation Pattern
- Graph Algorithms Optimal Complexity
Reconstruction: The DP recurrence follows from optimal substructure - either k is on the path or it isn’t. The k-outer loop ordering ensures we build paths incrementally using increasing sets of intermediate vertices.