GitHub

Infrastructure Drift

The infrastructure drift fuzzer detects when deployed cloud resources have diverged from the Terraform code that defines them. It runs terraform plan against every discovered module and reports any with a non-zero change count as drift. CFO invokes it on a schedule across all environment branches.

Running

Discover all modules:

./zig/zig build fuzz -- infra_drift --discover

Run a single module:

./zig/zig build fuzz -- infra_drift --infra=hb-infra --branch=production --module=vault

Dry run (print the plan command without executing):

./zig/zig build fuzz -- infra_drift --infra=hb-infra --branch=production --module=vault --dry-run

How It Works

CFO invokes the infra_drift binary in two stages. First, discovery mode enumerates all modules across all tracked branches and emits them as a JSON array. Second, CFO fans out — invoking infra_drift once per module in single-module mode, running plans concurrently across workers. Each single-module invocation runs the full detection pipeline and exits 0 (no drift) or 1 (drift found or execution error).

The development branch is excluded. Only production, shared, and non-production are checked.

Module Discovery

Discovery scans infra/ for directories with a -infra suffix, then walks each project directory looking for backend.tf files. Every directory containing backend.tf is a Terraform module. The path structure encodes three pieces of information:

infra/<project>/<business_unit>/<branch>/<module>/backend.tf

Discovery extracts the project, branch, and module name from the path, filters to the target branch, and returns the full list.

Across all five infrastructure projects and three environment branches, discovery yields approximately 96 modules.

Infrastructure Projects

Project Description
bees-infra Bees infrastructure
common-infra Shared/common infrastructure
ha-infra HA (Azure-based) infrastructure
hb-infra HB (GCP-based) infrastructure
pd-infra PD infrastructure

The Detection Pipeline

Each module runs through a three-stage pipeline:

flowchart LR
    A["tfo plan\n<project> <branch> <module>"] --> B["tfparse"]
    B --> C["jq '.[0].change_count'"]
    C --> D{"> 0?"}
    D -->|yes| E["DRIFT"]
    D -->|no| F["OK"]

tfo plan runs terraform plan through TFO with service account impersonation. tfparse parses the plan output into structured JSON. jq extracts the change_count field. Any value greater than zero means the deployed infrastructure no longer matches the code.

The pipeline runs as a bash one-liner with pipefail set so any stage failure propagates as an error:

tfo plan <project> <branch> <module> | tfparse | jq '.[0].change_count'

Transient API failures (gateway timeouts, TLS errors, provider download failures) are retried up to three times with exponential backoff before the module is recorded as an error.

Source

  • src/fuzz_tests/infra_drift/main.go — entry point, CLI flags, discovery mode
  • src/fuzz_tests/infra_drift/runner.go — single-plan execution and retry logic
  • src/fuzz_tests/infra_drift/discovery.go — module discovery and path parsing
Edit this page