Express infrastructure (cloud resources, machines, services, app configuration) as version-controlled, machine-readable files that a tool reconciles against the running system. The win is not “scripts in git.” It is the plan/diff/apply loop: a human-readable preview of what changes, run by a tool that knows the current state.
The category is not one tool but a stack of layers, each with different semantics. Most operational pain comes from using the wrong tool for a layer.
| Layer | What it manages | Right tools |
|---|---|---|
| Cloud provisioning | VMs, networks, IAM, managed services | Terraform, OpenTofu, Pulumi, CDK |
| Machine config | OS, packages, services, files | Ansible, NixOS, Chef, Puppet, Packer + golden images |
| App API objects | Dashboards, realms, roles, policies | Terraform providers, Ansible modules, app’s own config-as-code |
| DB schema | Tables, columns, indexes, constraints | Atlas (Schema as Code), Flyway, Liquibase, ORM migrations |
| Reference data | Lookup rows, seed admins | App migration tool, idempotent bootstrap job |
| Environment fixtures | Demo users, test tenants | One-shot job (CLI, k8s Job), guarded by env check |
| Cluster state (k8s) | Workloads, CRDs, secrets | GitOps reconcilers (Argo CD, Flux), Helm, Kustomize |
The errors people make are predictable: managing schema with Terraform (destroy-and-recreate is catastrophic for data), managing reference data with null_resource + local-exec (no ordering, state confusion), co-owning the same object with two tools (drift loops forever), and baking mutable app state into a NixOS closure (closures are for the machine, not for app data).
A clean stack: Terraform provisions an RDS instance plus networking and IAM, NixOS or Ansible configures the app servers, the app’s migration tool (Atlas, Flyway, Alembic) runs schema changes on deploy, and a k8s Job seeds env-specific fixtures once. The GitOps variant keeps the same layering but reconciles k8s manifests continuously through Argo CD or Flux, with Crossplane exposing cloud APIs as CRDs and operators like Strimzi for app-level objects. Where users still edit through a UI (Grafana dashboards, feature flags), config-as-code only works if the git repo is the source of truth and the UI is read-only; splitting the difference creates permanent drift.
The first design decision is layering, not tool choice. Once layers are clear, the tool per layer is mostly settled by the team’s existing shape (cloud vendor, k8s or not, language ecosystem). Declarative beats procedural at the steady-state layers (provisioning, config, schema); procedural still wins for one-shot bootstraps and rolling operations across a fleet. “Drift detection” only works if there is exactly one writer; two tools writing the same object is the most common reason IaC adoptions fail. The shift toward immutable infrastructure (build new images, replace hosts) collapses the “machine config” layer into the “provisioning” layer: there is no machine to configure, only an image to roll out. NixOS, OCI images plus Packer, and Talos Linux are different bets on this same direction. See DORA Capabilities and Metrics for IaC adoption as a delivery-performance correlate, and Trunk-Based Development for the workflow that maximizes its value.