IaC runbook
"I want to provision X" recipes for the Terraform IaC control plane.
Quick reference
| Task | Recipe |
|---|---|
| Stand up local AlphaSwarm on a laptop | Local environment |
| Stand up AlphaSwarm on rpi_kubernetes | rpi Kubernetes environment |
| Stand up paper-trading on GCP | Paper environment |
| Stand up production on AWS | Live environment |
| Stand up the seeded Wiley Tech home on Azure | Wiley Tech environment |
| Add a new module kind to the codegen | Add a module kind |
| Add a Terraform stack via the API | Create a stack via API |
| Plan / apply / destroy from the UI | Lifecycle from the frontend |
| Configure HCP Terraform as state backend | HCP Terraform |
| Wire OPA policy enforcement | Policy enforcement |
Local environment
cd alphaswarm_platform/terraform/environments/local
terraform init
terraform plan
terraform apply
What this provisions:
- Postgres / MinIO / Redis containers via
kreuzwerker/docker. - Minikube / kind cluster + namespaces (
alphaswarm-local/alphaswarm-paper/alphaswarm-live/alphaswarm-backtest/alphaswarm-system/alphaswarm-terraform). - Helm baseline: cert-manager / ESO / KEDA / ingress-nginx / kube-prometheus / otel-operator / istio.
- KEDA
ScaledObjectper Celery queue (including the newterraformqueue). - Per-bot Deployment with
alphaswarm-data-mcpsidecar (zero-egress NetworkPolicy on the agent container). - Local Docker registry on
:5000.
State is local (alphaswarm_platform/terraform/environments/local/terraform.tfstate).
rpi Kubernetes environment
alphaswarm-cli deploy publish-rpi --registry ghcr.io/<org> --tag <immutable-tag>
terraform -chdir=alphaswarm_platform/terraform/environments/rpi init
terraform -chdir=alphaswarm_platform/terraform/environments/rpi plan
terraform -chdir=alphaswarm_platform/terraform/environments/rpi apply
Recommended bootstrap sequence for first-time bring-up:
- CLI-first Terraform apply until base services are healthy.
- Verify API + Celery + Redis + Postgres are reachable.
- Move to control-plane actions (
/control-plane/kubernetes/targets/rpi/*).
This avoids enqueue/stream confusion during cold start when broker/DB are still bootstrapping.
Provider mirror + init retries
When provider downloads are unstable, define a Terraform CLI config file
with provider_installation mirror rules and point AlphaSwarm at it:
export ALPHASWARM_TERRAFORM_CLI_CONFIG_FILE=/absolute/path/to/terraform.tfrc
export ALPHASWARM_TERRAFORM_INIT_RETRY_ATTEMPTS=5
export ALPHASWARM_TERRAFORM_INIT_RETRY_BACKOFF_SECONDS=2
export ALPHASWARM_TERRAFORM_INIT_RETRY_MAX_BACKOFF_SECONDS=30
TerraformExecutor applies bounded retries for transient terraform init
failures and reuses ALPHASWARM_TERRAFORM_PLUGIN_CACHE_DIR between runs.
Paper environment
cd alphaswarm_platform/terraform/environments/paper
export TF_VAR_gcp_project_id=<your-gcp-project>
export TF_VAR_primary_domain=paper.alphaswarm.example
terraform init -backend-config="bucket=alphaswarm-terraform-state-paper"
terraform plan
terraform apply
What this provisions:
- GKE cluster (auto-promoted from
ALPHASWARM_DEFAULT_CLOUD_PROVIDER=gcp). - Cloud SQL Postgres (single AZ — cost-optimised for paper).
- GCS bucket + Memorystore Redis.
- GCP Secret Manager
ClusterSecretStore(ESO). - Bot Deployments with
dry_run=truefor paper trading. - 100% traffic to the Vite frontend (no canary split in paper).
Live environment
cd alphaswarm_platform/terraform/environments/live
export TF_VAR_aws_subnet_ids='["subnet-aaaa", "subnet-bbbb", "subnet-cccc"]'
export TF_VAR_primary_domain=app.wiley.tech
terraform init # picks up backend.tf with S3 + DynamoDB locking
terraform plan
terraform apply
What this provisions:
- EKS cluster Multi-AZ.
- RDS Multi-AZ Postgres + S3 versioning + ElastiCache 7+ cluster mode.
- AWS Secrets Manager
ClusterSecretStore. - Bot Deployments live (
dry_run=false);live_control=trueon the actor'sMembershipis required to trigger orders. - Full prod sizing for KEDA
maxReplicaCount(50 default / 100 ML / 200 backtest / 30 agents / 10 terraform).
Wiley Tech environment
This is the seeded production home for the org provisioned by Alembic 0051. Pinned to the Wiley Tech Entra tenant.
cd alphaswarm_platform/terraform/environments/wiley-tech
export TF_VAR_azure_tenant_id=<wiley tenant id>
export TF_VAR_azure_subscription_id=<sub id>
export TF_VAR_azure_resource_group=alphaswarm-wiley-tech
export TF_VAR_azure_keyvault_url=https://alphaswarm-wiley-tech-kv.vault.azure.net/
terraform init # picks up backend.tf with Azure Blob state
terraform plan
terraform apply
What this provisions:
- AKS cluster + Azure Workload Identity for ESO.
- Azure PostgreSQL Flexible Server (Zone-Redundant HA).
- ADLS Gen2 storage account (HNS enabled).
- Azure Cache for Redis (Standard, TLS-only).
- Azure Key Vault
ClusterSecretStoresynced via ESO Workload Identity. - ACR registry for AlphaSwarm images.
Add a module kind
- Add the kind to
TERRAFORM_MODULE_KINDSinalphaswarm/persistence/models_terraform.py. - Create the Jinja2 template at
alphaswarm/terraform/codegen/templates/<kind>_<cloud>.tf.j2(and a_localfallback). - (Optional) Mirror as a native HCL module under
alphaswarm_platform/terraform/modules/<kind>/. - Operators create a stack via
POST /terraform/stackswithmodule_kind: "<kind>".
Create a stack via API
curl -X POST http://localhost:8000/terraform/stacks \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"name": "Bronze tier storage",
"slug": "bronze-storage",
"module_kind": "storage",
"cloud_provider": "aws",
"environment": "live",
"variables": {
"aws_region": "us-east-1",
"aws_subnet_ids": ["subnet-aaa", "subnet-bbb", "subnet-ccc"],
"bucket_name": "alphaswarm-bronze",
"db_storage_gb": 500
},
"backend": { "kind": "s3", "config": { "bucket": "alphaswarm-tf-state", "key": "bronze-storage.tfstate" } },
"tags": { "tier": "bronze" }
}'
Response includes spec_version_id (immutable, hash-locked).
Then create a workspace + plan:
# Workspace
curl -X POST http://localhost:8000/terraform/workspaces \
-H "Content-Type: application/json" -H "Authorization: Bearer <token>" \
-d '{ "slug": "bronze-live", "name": "Bronze (live)", "stack_spec_id": "<id>", "environment": "live", "state_backend": "s3" }'
# Plan
curl -X POST http://localhost:8000/terraform/workspaces/<workspace_id>/plan \
-H "Authorization: Bearer <token>"
Subscribe to live progress at wss://<host>/terraform/ws/runs/<run_id>.
Lifecycle from the frontend
Navigate to /infra/terraform, click a workspace row → land on
/infra/terraform/workspaces/[id]:
- Click Plan → enqueues plan task; result lands in
awaiting_approval. - Review the plan summary on the run detail page (live WS stream).
- Click Apply this plan on the plan run row.
- Apply executes → state version snapshotted → outputs visible in the "Latest state outputs" card.
- Destroy is friction-gated: type the workspace slug to confirm.
HCP Terraform
- Create an HCP Terraform organization + workspaces in the HCP UI.
- Set
ALPHASWARM_HCP_TOKEN(preferred: viaCredentialResolver),ALPHASWARM_HCP_ORGANIZATION,ALPHASWARM_TERRAFORM_STATE_BACKEND=hcp. - Set the stack spec's
backend.kind="hcp"and the workspace'shcp_workspace_id. - The runtime now drives runs through
HcpClientinstead of the local subprocess (noterraformbinary required on the runner pod).
Policy enforcement
- Author OPA Rego policies that target Terraform plan JSON
(the runtime emits
tfplan.binary.jsonviaterraform show -json). - Insert a
TerraformPolicyAttachmentrow binding the policy file URI to a workspace. - Set
hard_mandatory=Trueto block apply on violation;hard_mandatory=Falseemits a warning. - When
opais on PATH the runtime invokesopa eval -i tfplan.json -d policy.rego "data.alphaswarm.terraform.deny". Without OPA installed the check no-ops cleanly.