Skip to main content

Terraform IaC control plane

Phase 7 of the multi-tenant rollout introduces the 5th sibling spec-runtime — TerraformRuntime — that joins AgentRuntime, BotRuntime, RLRuntime, AnalysisRuntime, and WorkflowRuntime.

The runtime is the only sanctioned executor for terraform plan/apply/ destroy/refresh operations. Routes / Celery tasks / MCP tools wrap it; nothing calls subprocess.run(["terraform", ...]) directly outside alphaswarm/terraform/runner.py::TerraformExecutor.

Architecture

Spec → version → run lifecycle

  1. Author a TerraformStackSpec (Pydantic). Hash is SHA-256 of canonical JSON.
  2. persist_spec(spec) creates a new terraform_stack_spec_versions row only when the hash changes (AGENTS rule 43).
  3. TerraformRuntime(spec).plan(workspace_id=...) opens a TerraformRun row (rule 34: carries experiment_id + test_id FKs), enqueues the plan task on the terraform Celery queue.
  4. Runner pod executes terraform init && terraform plan -out tfplan.binary, captures stdout/stderr to files in the workspace dir, parses terraform show -json tfplan.binary into a structured plan summary, optionally runs OPA Rego policies.
  5. Plan run lands in awaiting_approval. The frontend /infra/terraform/workspaces/[id] page renders an "Apply this plan" button.
  6. TerraformRuntime(spec).apply(plan_run_id=...) opens a child TerraformRun, executes terraform apply tfplan.binary, snapshots the resulting state into a TerraformStateVersion row.

Code generation

CDKTF was deprecated by HashiCorp on 2025-12-10. Python-side HCL generation uses Jinja2 templates under alphaswarm/terraform/codegen/templates/:

  • storage_{aws,gcp,azure,local}.tf.j2
  • faas_local.tf.j2 (KEDA + per-queue ScaledObjects)
  • agents_local.tf.j2 (bot pods with alphaswarm-data-mcp sidecar)
  • secrets_local.tf.j2 (ESO + ClusterSecretStore + ExternalSecret per secret_mappings)
  • generic.tf.j2 (fallback for module_source references)

Operator-authored stacks live under alphaswarm_platform/terraform/modules/ and are reachable via spec.module_source = "../../modules/storage".

State backends

Five backends are supported (ALPHASWARM_TERRAFORM_STATE_BACKEND):

KindBackend block
localterraform { backend "local" { ... } }
s3backend "s3" { bucket / key / dynamodb }
azurermbackend "azurerm" { storage_account_name }
gcsbackend "gcs" { bucket / prefix }
hcpHCP Terraform via HcpClient

The HCP path uses alphaswarm/terraform/hcp_client.py (thin httpx wrapper around app.terraform.io/api/v2) — no python-terrasnek dep so cold installs without HCP credentials still boot cleanly.

Bootstrap and reliability notes

  • During cold-start deployments, prefer CLI-first terraform init/plan/apply until API + Celery + Redis + Postgres are all healthy.
  • Control-plane-triggered Terraform actions require broker + worker availability to enqueue and stream progress.
  • TerraformExecutor retries transient terraform init provider/network failures with bounded exponential backoff.
  • Use ALPHASWARM_TERRAFORM_CLI_CONFIG_FILE to point at a Terraform CLI config that defines provider_installation mirror rules when registry access is unreliable.
  • Provider cache is shared through ALPHASWARM_TERRAFORM_PLUGIN_CACHE_DIR.

Kill switch

POST /terraform/halt is the 6th endpoint fanned out by the topbar KillSwitch (alongside /agents/halt, /quant-agents/halt, /paper/stop-all, /bots/halt-all, /rl/halt-all, /workflows/halt). On halt every queued | running | awaiting_approval TerraformRun is marked cancelled + halted=True.

Policy gate (OPA)

TerraformPolicyAttachment rows bind a workspace to one or more OPA Rego policy files. The runtime calls PolicyChecker.check after every plan; hard_mandatory=True attachments block the corresponding apply on violation. When opa is not on PATH the checker no-ops (so dev / CI without OPA installed still works).

Frontend

Vite/React surfaces under alphaswarm_client/src/routes/infra/:

  • /infra — 7 tabbed panes (overview / bots / queues / pipeline / secrets / k8s / canary) + a Terraform inline summary.
  • /infra/terraform — workspace list with per-row Plan / Apply / Destroy (friction-gated).
  • /infra/terraform/workspaces/[id] — workspace detail + run history
    • latest state outputs.
  • /infra/terraform/runs/[id] — run detail with live WS progress stream (/terraform/ws/runs/{id}).
  • /infra/terraform/stacks — stack spec catalog.

Where to look for X

TaskPath
Add a new module kindalphaswarm/terraform/codegen/templates/ + alphaswarm/persistence/models_terraform.py::TERRAFORM_MODULE_KINDS
Add an MCP toolalphaswarm/data/mcp/tools/terraform.py
Add a REST routealphaswarm/api/routes/terraform.py
Add a Celery taskalphaswarm/tasks/terraform_tasks.py
Edit the runner podalphaswarm_platform/terraform/modules/terraform_runner/main.tf
Add a state backendalphaswarm/terraform/codegen/wrapper.py
Add an OPA policyReference the file URI via TerraformPolicyAttachment.policy_set_uri