Infrastructure as Code: Beyond the Basics

cloud-fundamentals - This article is part of a series.

Part : This Article

Part : Getting Started with Cloud Infrastructure

Most teams that adopt infrastructure as code follow the same trajectory: they start by translating their existing setup into configuration files, feel good about the reproducibility, and then gradually accumulate a codebase that becomes harder to change than the manual processes it replaced.

This post is about avoiding that outcome.

The Problem with “It Works”
#

When infrastructure as code is done poorly, it exhibits specific failure modes:

Drift between what the code describes and what actually exists in the cloud, usually caused by manual changes made during incidents
Sprawl as the codebase grows without consistent structure, making it hard to find things or understand dependencies
Fear of change because nobody is confident what a modification will actually do
Long feedback loops because running a plan or apply takes ten minutes and touches too many resources at once

These aren’t tool problems. They’re process and structure problems that any IaC tool can develop.

State is the Hard Part
#

The central challenge in infrastructure as code is state management. Your IaC tool maintains a record of what it believes exists in the world. When that record diverges from reality, things break in confusing ways.

A few practices that help:

Lock state during applies. If two engineers run applies simultaneously against the same state, the results are unpredictable. Remote state backends with locking prevent this.

Store state remotely, not locally. Local state files get lost, go stale, and can’t be shared. Remote backends with versioning give you a safety net.

Import before you write. If you’re codifying existing infrastructure, import resources into state before writing the configuration. Writing the config first leads to duplicated resources.

Treat state corruption seriously. If your state file is corrupted or severely drifted, stop and fix it before making any other changes. Applying on top of bad state compounds the problem.

Module Design
#

Modules are the primary unit of reuse in most IaC tools. They’re also where most complexity lives, for better and worse.

A well-designed module:

Has a single, clear purpose
Exposes inputs for everything that legitimately varies between uses
Hides implementation details that callers shouldn’t need to know about
Is versioned so that callers can upgrade deliberately

A poorly designed module:

Does too many things and accumulates unrelated resources over time
Exposes every input, making callers deal with details they don’t care about
Has implicit dependencies that aren’t expressed in the interface
Is modified in place, breaking callers unexpectedly

The right granularity is usually “one module per logical component” — a database, a service, a network — rather than “one module per resource type” or “one module for everything.”

Testing Infrastructure Code
#

Infrastructure code is harder to test than application code because side effects are the point. You can’t meaningfully test a VPC configuration without creating a VPC.

That said, several layers of testing are worth implementing:

Static analysis catches formatting issues, deprecated syntax, and common mistakes without touching the cloud. This should run on every commit and be fast.

Plan validation runs a plan against a real (or ephemeral) environment and asserts on the output. You can check that expected resources will be created, that certain tags are present, that no unexpected destructive changes will occur.

Integration testing actually applies infrastructure and verifies it behaves correctly. This is slow and expensive, so it’s usually reserved for CI on the main branch rather than every pull request.

Compliance scanning checks that resources will be configured according to your security and governance policies before they’re created.

The Pipeline Question
#

Infrastructure changes should go through a pipeline, not be applied from an engineer’s laptop. This isn’t just about consistency — it’s about auditability. When something goes wrong, you want to know exactly what was applied, by whom, and when.

A reasonable pipeline:

Validate and lint on every commit
Plan on every pull request, with the plan output posted as a comment
Require approval before applying to production
Apply in CI, not locally
Notify on success or failure

The plan-then-apply workflow is particularly important. Engineers reviewing a pull request should see exactly what infrastructure changes will result from merging. Reviewing a diff of configuration files is not the same as reviewing a diff of what will actually change.

Organizational Considerations
#

As infrastructure codebases grow, ownership becomes a challenge. A few patterns that help:

Separate repositories by blast radius. Core networking and shared services go in one repository, individual application infrastructure in another. Changes to the network shouldn’t require touching the same codebase as changes to a specific service’s database.

Establish conventions early. Naming conventions, tagging requirements, and module structure are much harder to retrofit than to establish from the start. Write them down and enforce them in CI.

Make the safe path the easy path. If following best practices requires significantly more work than ignoring them, they won’t be followed. Invest in tooling and templates that make the right thing the easy thing.

Infrastructure as code is one of those practices that’s worth doing well. Done poorly, it’s overhead. Done well, it becomes the foundation that everything else is built on.

cloud-fundamentals - This article is part of a series.

Part : This Article

Part : Getting Started with Cloud Infrastructure

The Problem with “It Works”#

State is the Hard Part#

Module Design#

Testing Infrastructure Code#

The Pipeline Question#

Organizational Considerations#