GitHub

App Service Accounts

Each application has its own GCP service account (SA) with only the roles it actually uses. The SA key is fetched at container startup so the app authenticates as itself rather than as the VM it happens to be running on.

Why per-app SAs

In single-app VMs, an app could safely fall back to Application Default Credentials (ADC) via the GCE metadata server — the VM’s compute SA was effectively the app’s SA.

With the swarm consolidation, multiple apps share the same VM (e.g., hb-p-vm2 runs HBCRM, BEA-HBCRM, DHA, and Eligibility). They all hit the same metadata server, so they all inherit the same compute SA, which holds the union of every app’s required roles. DHA ends up with HBCRM’s bucket admin, Eligibility ends up with BigQuery access, and so on.

Per-app SAs break that. Each container authenticates with its own credential file and gets only the roles its app actually calls.

How they’re provisioned

One Terraform-managed SA per app, named <app>-app (e.g., hbcrm-app, pdp10-app, copo3-app), with scoped IAM grants for the GCS buckets, BigQuery datasets, Pub/Sub topics, and secrets the app uses. See:

Keys are not created by Terraform — that would put private keys in the state file. After terraform apply, run the helper script:

./zig/zig build scripts -- create_sa_key --infra hb-infra --env n

It generates a JSON key with gcloud iam service-accounts keys create, writes it to the matching <app>-app-sa-key secret in the vault-keys project, and deletes the local copy. See src/scripts/create_sa_key/main.py.

How apps consume them

Two env vars opt an app into per-app SA auth:

APP_SA_KEY_SECRET=hbcrm-app-sa-key
GOOGLE_APPLICATION_CREDENTIALS=/app/secrets/gcp_sa_key.json

At container startup, fetch_secrets.sh (see Secrets Management) does an extra step when APP_SA_KEY_SECRET is set: it fetches the SA key from the vault-keys project (using the VM SA’s secretAccessor grant) and writes it to /app/secrets/gcp_sa_key.json with mode 400. Google client libraries pick up GOOGLE_APPLICATION_CREDENTIALS automatically — no code changes inside the app.

If the fetch fails, fetch_secrets.sh exits non-zero. There is no silent fallback to the metadata server, so a misconfigured key surfaces as a startup failure rather than as the app quietly running with compute-SA privileges.

Security posture

This is currently a parallel deployment — per-app SAs hold the roles each app needs, but the compute SAs still hold the union of every app’s roles too. That keeps the rollout safe: if an app is misconfigured, it falls back to the (broad) compute SA via the metadata server and keeps working.

The follow-up cleanup is to strip the compute SAs down to just the roles the VM itself needs (ops agent, artifact registry, cloudsql client), so that an app running without GOOGLE_APPLICATION_CREDENTIALS would lose access instead of silently inheriting it. That ticket lands once every app on a given VM has been validated end-to-end with its per-app SA.

Edit this page