Saltar al contenido principal

Go-live: minimum deployment

Deep-dive companions: aws-deploy.md (full hybrid bootstrap), aws-runbook.md (day-2 pipeline ops), tenant-router auth rollout (edge enforcement, when the k8s edge goes live).

This is the shortest path to four live surfaces, in dependency order. Current state (verified 2026-06-09): all application code is merged to main, but no deploy has ever succeeded — every AWS-touching workflow fails at configure-aws-credentials because the one-time bootstrap (OIDC deployer roles + GitHub repo variables) has not been run, and the docs repo had no content-deploy workflow at all (added as deploy-pages.yml alongside this page).

OrderSurfaceMechanismNeeds
1Docs + marketing/landingalphaswarm_docs deploy-pages.yml → Cloudflare Pages2 Cloudflare secrets, no AWS
2AWS bootstraplocal terraform apply ×1AWS admin profile
3Platform minimum (VPC/ECR/RDS/Redis/ALB/ECS)local apply of infrastructure/envs/minimum then terraform/environments/minimumstep 2
4Admin UIalphaswarm_admin build-publish.yml → ECR → app-tier redeploysteps 2–3 + repo vars
5Custom domains / apex (optional)deploy-edge.yml Terraform stackssteps 2 + Vault CF token

1. Docs + landing site (Cloudflare Pages — independent of AWS)

The Docusaurus build is one artifact serving the landing page at / and the docs tree beneath it. deploy-pages.yml builds and ships it to the alphaswarm-docs Pages project, creating the project on first run (Terraform later adopts it — see step 5).

# One-time: create a Cloudflare API token scoped
# Account > Cloudflare Pages > Edit
# at https://dash.cloudflare.com/profile/api-tokens, then:
gh secret set CLOUDFLARE_API_TOKEN --repo Alpha-Swarm-ai/alphaswarm_docs
gh secret set CLOUDFLARE_ACCOUNT_ID --repo Alpha-Swarm-ai/alphaswarm_docs

# Deploy (also auto-runs on every push to main):
gh workflow run deploy-pages.yml --repo Alpha-Swarm-ai/alphaswarm_docs --ref main
gh run watch --repo Alpha-Swarm-ai/alphaswarm_docs

# Live at https://alphaswarm-docs.pages.dev until step 5 attaches
# the alpha-swarm.ai apex + www domains.

2. AWS bootstrap (one-time, local terminal, ~10 min)

Mints the GitHub OIDC provider, deployer roles, and the state bucket/lock/KMS that every other stack depends on. Local state by design.

cd alphaswarm_platform/infrastructure/bootstrap
export AWS_PROFILE=alphaswarm-shared-platform-admin # your admin profile
terraform init
terraform apply -var=account_alias=shared
# (Repeat with -var=account_alias={dev,qa,prod} only when you adopt the
# multi-account split; the minimum tier lives in the single account.)

3. Platform minimum tier (single account, ~$140/mo)

Infra tier (VPC, ECR ×3, RDS Postgres, Redis, Bedrock policy, the AqpGithubDeployerMinimum role, SSM handles), then app tier (Cognito, ALB, ECS Fargate control plane):

cd alphaswarm_platform/infrastructure/envs/minimum
cp backend.hcl.example backend.hcl && cp terraform.tfvars.example terraform.tfvars
$EDITOR backend.hcl terraform.tfvars # bucket/table from step 2 outputs
terraform init -backend-config=backend.hcl
terraform apply

cd ../../../terraform/environments/minimum
terraform init -backend-config=backend.hcl # same pattern
terraform apply # first apply: images don't exist yet — services
# stabilise after step 4 pushes them

Then wire CI so subsequent rollouts never need a laptop — set the repo variables from the role ARNs the applies just published (SSM /alphaswarm/minimum/* / stack outputs):

for repo in alphaswarm_platform alphaswarm_admin; do
gh variable set AWS_PLAN_ROLE_ARN --repo Alpha-Swarm-ai/$repo --body "arn:aws:iam::<ACCOUNT>:role/<plan-role>"
gh variable set AWS_APPLY_ROLE_ARN --repo Alpha-Swarm-ai/$repo --body "arn:aws:iam::<ACCOUNT>:role/AqpGithubDeployerMinimum"
gh variable set AWS_BUILD_ROLE_ARN --repo Alpha-Swarm-ai/$repo --body "arn:aws:iam::<ACCOUNT>:role/AqpGithubDeployerMinimum"
gh variable set AWS_REGION --repo Alpha-Swarm-ai/$repo --body us-east-1
done
gh variable set SHARED_ACCOUNT_ID --repo Alpha-Swarm-ai/alphaswarm_platform --body "<ACCOUNT>"
# Plus the prod GitHub Environment (Settings → Environments → prod,
# required reviewers) — alphaswarm_admin's main-branch build binds to it.

4. Admin UI

build-publish.yml builds the admin backend + frontend images multi-arch, pushes to the ECR repos step 3 created, and fires the admin-image-published dispatch so the platform app tier redeploys with the new tags (needs PLATFORM_DISPATCH_TOKEN — a fine-grained PAT with actions:write on alphaswarm_platform):

gh secret set PLATFORM_DISPATCH_TOKEN --repo Alpha-Swarm-ai/alphaswarm_admin
gh workflow run build-publish.yml --repo Alpha-Swarm-ai/alphaswarm_admin \
--ref main -f env=minimum
gh run watch --repo Alpha-Swarm-ai/alphaswarm_admin

# Verify rollout (alarm/log/scale surface is the admin's own ECS panel
# once it's up — control-platform-ecs-deployment.md):
aws ecs describe-services --cluster "$(aws ssm get-parameter \
--name /alphaswarm/minimum/ecs_cluster_name --query Parameter.Value --output text)" \
--services alphaswarm-admin --query 'services[0].deployments'
# Admin UI answers on the ALB DNS name (output of the app-tier apply)
# at "/" with the backend health at /admin/health.

5. Custom domains / apex public surface (optional, after 1–3)

The docs-edge + apex-redirect + demo-edge Terraform stacks attach alpha-swarm.ai (+ www, docs.* alias, Access-gated /demo) to the Pages project from step 1. They run through alphaswarm_platform/.github/workflows/deploy-edge.yml → CodeBuild → alphaswarm deploy (AGENTS rule 42), so they need step 3's roles plus the Cloudflare token in Vault. Because step 1 already created the Pages project, import it before the first apply:

# inside the docs-edge stack workspace:
terraform import 'module.cloudflare_pages_docs.cloudflare_pages_project.docs' \
'<CLOUDFLARE_ACCOUNT_ID>/alphaswarm-docs'

gh workflow run deploy-edge.yml --repo Alpha-Swarm-ai/alphaswarm_platform \
--ref main -f stack=docs-edge -f env=prod -f action=plan # then action=apply

Known footguns

  • build-publish.yml (platform repo) tags :latest unconditionally — only ever dispatch it from main or a version tag.
  • The k8s edge ships fail-closed: stamp ALPHASWARM_TENANT_ROUTER_OIDC_ISSUER/_AUDIENCE before applying deployments/kubernetes/edge/ or the tenant-router crash-loops by design (runbook). Not part of the minimum tier (no EKS), listed here because the manifests are on main.
  • terraform-pipeline.yml auto-plans admin-dev on every main push — it stays red until step 3's variables exist; that's the expected signal, not a regression.