Core Principle
Infrastructure as Code (IaC) is the practice of expressing infrastructure (cloud resources, machines, services, app configuration) as version-controlled, machine-readable files that a tool reconciles against the running system. The win is not “scripts in git.” It’s the plan/diff/apply loop: a human-readable preview of what changes, run by a tool that knows the current state.
The category is not one tool. It’s a stack of layers, each with different semantics, and most operational pain comes from using the wrong tool for a layer.
Why This Matters
The natural instinct is “pick one tool for everything.” Every layer has constraints that break that instinct:
| Layer | What it manages | Right tools |
|---|---|---|
| Cloud provisioning | VMs, networks, IAM, managed services | Terraform, OpenTofu, Pulumi, CDK |
| Machine config | OS, packages, services, files | Ansible, NixOS, Chef, Puppet, Packer + golden images |
| App API objects | Dashboards, realms, roles, policies | Terraform providers, Ansible modules, app’s own config-as-code |
| DB schema | Tables, columns, indexes, constraints | Atlas (Schema as Code), Flyway, Liquibase, ORM migrations |
| Reference data | Lookup rows, seed admins | App migration tool, idempotent bootstrap job |
| Environment fixtures | Demo users, test tenants | One-shot job (CLI, k8s Job), guarded by env check |
| Cluster state (k8s) | Workloads, CRDs, secrets | GitOps reconcilers (Argo CD, Flux), Helm, Kustomize |
The errors people make are predictable:
- Managing schema with Terraform (destroy-and-recreate is catastrophic for data)
- Managing reference data with
null_resource+local-exec(no ordering, state confusion) - Co-owning the same object with two tools (drift loops forever)
- Baking mutable app state into a NixOS closure (closures are for the machine, not for app data)
Evidence/Examples
- Clean stack: Terraform provisions an RDS instance + networking + IAM. NixOS or Ansible configures the app servers. The app’s migration tool (Atlas, Flyway, Alembic) runs schema changes on deploy. A k8s Job seeds env-specific fixtures once.
- GitOps variant: Same layering, but Argo CD or Flux reconciles k8s manifests continuously, with Crossplane exposing cloud APIs as CRDs and operators (Strimzi, Zalando Postgres Operator) for app-level objects.
- Where users still edit through a UI (Grafana dashboards, feature flags), config-as-code only works if the git repo is the source of truth and the UI is read-only. Splitting the difference creates permanent drift.
Implications
- The first design decision is layering, not tool choice. Once layers are clear, the tool per layer is mostly settled by the team’s existing shape (cloud vendor, k8s or not, language ecosystem).
- Declarative beats procedural at the steady-state layers (provisioning, config, schema). Procedural still wins for one-shot bootstraps and rolling operations across a fleet.
- “Drift detection” only works if there is exactly one writer. Two tools writing the same object is the most common reason IaC adoptions fail.
- The shift toward immutable infrastructure (build new images, replace hosts) collapses the “machine config” layer into the “provisioning” layer: there’s no machine to configure, only an image to roll out. NixOS, OCI images + Packer, and Talos Linux are different bets on this same direction.
Related Ideas
- Terraform — provisioning layer, declarative with state
- Ansible — machine config layer, procedural and stateless
- Atlas (Schema as Code) — schema layer, declarative with safety linting
- AI-Native Infrastructure The Nix-LLM Virtuous Cycle — declarative environments and reproducibility
- Nix - Home Manager — same idea applied to personal machines
- DORA Capabilities and Metrics — IaC adoption is one of the technical capabilities DORA correlates with delivery performance
- Trunk-Based Development — IaC reaches its full value when changes flow through the same trunk-based pipeline as app code
Questions
- Is the layering above stable, or does GitOps + Crossplane + operators eventually collapse “provisioning” and “API objects” into one reconciliation loop?
- How should teams handle the political problem of “this dashboard is owned by git” when product managers expect to edit it in the UI?
- What is the right ownership split between platform team (provisioning, schema linting) and app teams (migrations, fixtures, dashboards)?