Terraform IaC control plane
Phase 7 of the multi-tenant rollout introduces the 5th sibling
spec-runtime — TerraformRuntime — that joins AgentRuntime,
BotRuntime, RLRuntime, AnalysisRuntime, and WorkflowRuntime.
The runtime is the only sanctioned executor for terraform plan/apply/ destroy/refresh operations. Routes / Celery tasks / MCP tools wrap it;
nothing calls subprocess.run(["terraform", ...]) directly outside
alphaswarm/terraform/runner.py::TerraformExecutor.
Architecture
Spec → version → run lifecycle
- Author a
TerraformStackSpec(Pydantic). Hash is SHA-256 of canonical JSON. persist_spec(spec)creates a newterraform_stack_spec_versionsrow only when the hash changes (AGENTS rule 43).TerraformRuntime(spec).plan(workspace_id=...)opens aTerraformRunrow (rule 34: carriesexperiment_id+test_idFKs), enqueues the plan task on theterraformCelery queue.- Runner pod executes
terraform init && terraform plan -out tfplan.binary, captures stdout/stderr to files in the workspace dir, parsesterraform show -json tfplan.binaryinto a structured plan summary, optionally runs OPA Rego policies. - Plan run lands in
awaiting_approval. The frontend/infra/terraform/workspaces/[id]page renders an "Apply this plan" button. TerraformRuntime(spec).apply(plan_run_id=...)opens a childTerraformRun, executesterraform apply tfplan.binary, snapshots the resulting state into aTerraformStateVersionrow.
Code generation
CDKTF was deprecated by HashiCorp on 2025-12-10. Python-side HCL
generation uses Jinja2 templates under
alphaswarm/terraform/codegen/templates/:
storage_{aws,gcp,azure,local}.tf.j2faas_local.tf.j2(KEDA + per-queue ScaledObjects)agents_local.tf.j2(bot pods withalphaswarm-data-mcpsidecar)secrets_local.tf.j2(ESO + ClusterSecretStore + ExternalSecret persecret_mappings)generic.tf.j2(fallback formodule_sourcereferences)
Operator-authored stacks live under alphaswarm_platform/terraform/modules/
and are reachable via spec.module_source = "../../modules/storage".
State backends
Five backends are supported (ALPHASWARM_TERRAFORM_STATE_BACKEND):
| Kind | Backend block |
|---|---|
| local | terraform { backend "local" { ... } } |
| s3 | backend "s3" { bucket / key / dynamodb } |
| azurerm | backend "azurerm" { storage_account_name } |
| gcs | backend "gcs" { bucket / prefix } |
| hcp | HCP Terraform via HcpClient |
The HCP path uses
alphaswarm/terraform/hcp_client.py (thin
httpx wrapper around app.terraform.io/api/v2) — no
python-terrasnek dep so cold installs without HCP credentials still
boot cleanly.
Bootstrap and reliability notes
- During cold-start deployments, prefer CLI-first
terraform init/plan/applyuntil API + Celery + Redis + Postgres are all healthy. - Control-plane-triggered Terraform actions require broker + worker availability to enqueue and stream progress.
TerraformExecutorretries transientterraform initprovider/network failures with bounded exponential backoff.- Use
ALPHASWARM_TERRAFORM_CLI_CONFIG_FILEto point at a Terraform CLI config that definesprovider_installationmirror rules when registry access is unreliable. - Provider cache is shared through
ALPHASWARM_TERRAFORM_PLUGIN_CACHE_DIR.
Kill switch
POST /terraform/halt is the 6th endpoint fanned out by the topbar
KillSwitch (alongside /agents/halt, /quant-agents/halt,
/paper/stop-all, /bots/halt-all, /rl/halt-all,
/workflows/halt). On halt every queued | running | awaiting_approval TerraformRun is marked cancelled + halted=True.
Policy gate (OPA)
TerraformPolicyAttachment rows bind a workspace to one or more OPA
Rego policy files. The runtime calls
PolicyChecker.check after every plan;
hard_mandatory=True attachments block the corresponding apply on
violation. When opa is not on PATH the checker no-ops (so dev / CI
without OPA installed still works).
Frontend
Vite/React surfaces under alphaswarm_client/src/routes/infra/:
/infra— 7 tabbed panes (overview / bots / queues / pipeline / secrets / k8s / canary) + a Terraform inline summary./infra/terraform— workspace list with per-row Plan / Apply / Destroy (friction-gated)./infra/terraform/workspaces/[id]— workspace detail + run history- latest state outputs.
/infra/terraform/runs/[id]— run detail with live WS progress stream (/terraform/ws/runs/{id})./infra/terraform/stacks— stack spec catalog.
Where to look for X
| Task | Path |
|---|---|
| Add a new module kind | alphaswarm/terraform/codegen/templates/ + alphaswarm/persistence/models_terraform.py::TERRAFORM_MODULE_KINDS |
| Add an MCP tool | alphaswarm/data/mcp/tools/terraform.py |
| Add a REST route | alphaswarm/api/routes/terraform.py |
| Add a Celery task | alphaswarm/tasks/terraform_tasks.py |
| Edit the runner pod | alphaswarm_platform/terraform/modules/terraform_runner/main.tf |
| Add a state backend | alphaswarm/terraform/codegen/wrapper.py |
| Add an OPA policy | Reference the file URI via TerraformPolicyAttachment.policy_set_uri |