Core Principle

Infrastructure as Code (IaC) is the practice of expressing infrastructure (cloud resources, machines, services, app configuration) as version-controlled, machine-readable files that a tool reconciles against the running system. The win is not “scripts in git.” It’s the plan/diff/apply loop: a human-readable preview of what changes, run by a tool that knows the current state.

The category is not one tool. It’s a stack of layers, each with different semantics, and most operational pain comes from using the wrong tool for a layer.

Why This Matters

The natural instinct is “pick one tool for everything.” Every layer has constraints that break that instinct:

LayerWhat it managesRight tools
Cloud provisioningVMs, networks, IAM, managed servicesTerraform, OpenTofu, Pulumi, CDK
Machine configOS, packages, services, filesAnsible, NixOS, Chef, Puppet, Packer + golden images
App API objectsDashboards, realms, roles, policiesTerraform providers, Ansible modules, app’s own config-as-code
DB schemaTables, columns, indexes, constraintsAtlas (Schema as Code), Flyway, Liquibase, ORM migrations
Reference dataLookup rows, seed adminsApp migration tool, idempotent bootstrap job
Environment fixturesDemo users, test tenantsOne-shot job (CLI, k8s Job), guarded by env check
Cluster state (k8s)Workloads, CRDs, secretsGitOps reconcilers (Argo CD, Flux), Helm, Kustomize

The errors people make are predictable:

  • Managing schema with Terraform (destroy-and-recreate is catastrophic for data)
  • Managing reference data with null_resource + local-exec (no ordering, state confusion)
  • Co-owning the same object with two tools (drift loops forever)
  • Baking mutable app state into a NixOS closure (closures are for the machine, not for app data)

Evidence/Examples

  • Clean stack: Terraform provisions an RDS instance + networking + IAM. NixOS or Ansible configures the app servers. The app’s migration tool (Atlas, Flyway, Alembic) runs schema changes on deploy. A k8s Job seeds env-specific fixtures once.
  • GitOps variant: Same layering, but Argo CD or Flux reconciles k8s manifests continuously, with Crossplane exposing cloud APIs as CRDs and operators (Strimzi, Zalando Postgres Operator) for app-level objects.
  • Where users still edit through a UI (Grafana dashboards, feature flags), config-as-code only works if the git repo is the source of truth and the UI is read-only. Splitting the difference creates permanent drift.

Implications

  • The first design decision is layering, not tool choice. Once layers are clear, the tool per layer is mostly settled by the team’s existing shape (cloud vendor, k8s or not, language ecosystem).
  • Declarative beats procedural at the steady-state layers (provisioning, config, schema). Procedural still wins for one-shot bootstraps and rolling operations across a fleet.
  • “Drift detection” only works if there is exactly one writer. Two tools writing the same object is the most common reason IaC adoptions fail.
  • The shift toward immutable infrastructure (build new images, replace hosts) collapses the “machine config” layer into the “provisioning” layer: there’s no machine to configure, only an image to roll out. NixOS, OCI images + Packer, and Talos Linux are different bets on this same direction.

Questions

  • Is the layering above stable, or does GitOps + Crossplane + operators eventually collapse “provisioning” and “API objects” into one reconciliation loop?
  • How should teams handle the political problem of “this dashboard is owned by git” when product managers expect to edit it in the UI?
  • What is the right ownership split between platform team (provisioning, schema linting) and app teams (migrations, fixtures, dashboards)?