Saltar al contenido principal

CI/CD pipelines

The AlphaSwarm AWS deployment is driven by CI/CD: GitHub Actions orchestrates the pipeline and AWS CodeBuild runs the heavy in-VPC work (multi-arch buildx builds to ECR, and the alphaswarm deploy app-tier apply). There are no static AWS keys anywhere in the pipeline — every cloud step authenticates through GitHub OIDC.

This page explains the topology, the trust model, and the workflows. For the task-oriented steps (creating environments, triggering a deploy, approving a prod release, rolling back) see the companion runbook Operations runbook — CI/CD deploy. For the deeper deploy walkthroughs see AWS Hybrid Deployment Guide and AWS Hybrid Operational Runbook.

Topology — GitHub Actions, CodeBuild, OIDC

GitHub Actions is the control plane: it reacts to pushes, tags, pull requests, and repository_dispatch, then either runs lightweight Terraform directly or delegates the in-VPC heavy lifting to CodeBuild via aws codebuild start-build. The GitHub Actions job first assumes an AWS role over OIDC, so the start-build call (and everything CodeBuild does downstream) runs under short-lived credentials.

Why split the work this way:

  • GitHub Actions is cheap, parallel, and is where the promotion gates (GitHub Environments + required reviewers) live.
  • CodeBuild runs inside the workload VPC, so it can reach private subnets, the internal CodeArtifact PyPI, and the app-tier resources that alphaswarm deploy manages. It also gives multi-arch buildx a beefy, in-account builder close to ECR.

Authentication — GitHub OIDC, no static keys

Trust is configured per account via the infrastructure/modules/github-oidc module, which registers the GitHub OIDC provider and the IAM roles. The provider trusts both deploying repos:

  • Alpha-Swarm-ai/alphaswarm_platform
  • Alpha-Swarm-ai/alphaswarm_admin

Plan role vs apply role

The module emits two roles per account, with different trust conditions on the OIDC sub claim:

  • Plan role — read-only. Trusted on pull-request refs so that PR validation can run terraform plan / validate without any mutate permission. Example trusted subjects:

    repo:Alpha-Swarm-ai/alphaswarm_platform:pull_request
    repo:Alpha-Swarm-ai/alphaswarm_platform:ref:refs/heads/main
  • Apply role — read-write. Trusted only on refs/heads/main and scoped to a GitHub Environment, so an apply cannot run until the Environment's required reviewers approve. Example trusted subjects:

    repo:Alpha-Swarm-ai/alphaswarm_platform:ref:refs/heads/main
    repo:Alpha-Swarm-ai/alphaswarm_platform:environment:prod

The apply role ARN is published per environment as the AWS_DEPLOYER_ROLE_ARN repo variable (one value per GitHub Environment); the plan role ARN is published alongside it. A workflow job selects the role for its target env, then assumes it over OIDC.

permissions:
id-token: write # required to mint the GitHub OIDC token
contents: read

jobs:
apply:
environment: prod # gates on the Environment's required reviewers
steps:
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ vars.AWS_DEPLOYER_ROLE_ARN }}
aws-region: us-east-1

Hybrid Terraform boundary

There are two Terraform trees and they are applied two different ways. The boundary is deliberate.

TreeWhat it ownsApplied byAuthAudit
infrastructure/Landing zone: VPC, ECR, RDS, EKS, OIDC provider, observability, the CodeBuild/CodeArtifact plumbingNative terraform plan / terraform applyOIDC into AqpTerraformExecutionRoleTerraform state only
terraform/App tier: the per-env application composition deployed onto the platformalphaswarm deploy plan / alphaswarm deploy up (TerraformRuntime)TerraformRuntime in CodeBuildWrites a terraform_runs audit row

The app tree is never applied with a bare terraform apply. It goes through alphaswarm deploy, which drives TerraformRuntime and writes a terraform_runs audit row for every plan and apply (platform AGENTS rule 42). That keeps the app-tier change history in the same ledger as every other runtime action. See Terraform IaC control plane for how TerraformRuntime works and IaC runbook for the provisioning recipes.

# Landing zone (infrastructure/): native terraform, OIDC -> AqpTerraformExecutionRole
terraform -chdir=infrastructure/envs/dev init
terraform -chdir=infrastructure/envs/dev plan

# App tier (terraform/): alphaswarm deploy, writes a terraform_runs row
alphaswarm deploy plan --env dev
alphaswarm deploy up --env dev

CodeArtifact for alphaswarm-core and the CLI

alphaswarm-core and the alphaswarm CLI are not installed from public PyPI in CI or in the Docker images. They are pulled from the platform's AWS CodeArtifact internal PyPI repository, alphaswarm-pypi. CI (and every Dockerfile build step that needs the CLI) authenticates to CodeArtifact over the same OIDC-derived credentials and configures it as the pip index:

aws codeartifact login --tool pip \
--domain alphaswarm --repository alphaswarm-pypi
pip install alphaswarm-core "alphaswarm[deploy]"

This keeps the internal packages private and gives CI a stable, in-account index that does not depend on public PyPI availability.

The three canonical workflows

These names match compliance/soc2-evidence-map.md, how-to/operations/aws-deploy.md, how-to/operations/aws-runbook.md, and ADR 006 — alphaswarm_admin overhaul.

terraform-pipeline.yml

The deploy workflow for both Terraform trees.

  • Inputs: treealphaswarm_platform, envprod, actionapply.
  • push to main: runs a plan against dev automatically.
  • Dispatch (apply): assumes the env's apply role and applies the selected tree. For tree=infrastructure it runs native terraform apply; for tree=alphaswarm_platform it delegates to CodeBuild, which runs alphaswarm deploy up (and lands the terraform_runs row).

build-publish.yml

The image release workflow. Triggers on a v* tag and, for each service, performs a supply-chain-hardened build:

  • multi-arch buildx build, pushed to ECR;
  • Cosign keyless signature (OIDC, no long-lived keys);
  • syft SBOM generation;
  • SLSA provenance attestation;
  • Trivy and Grype vulnerability scans.

The per-service build/sign/push logic is factored into the composite action .github/actions/build-sign-push/, so every service builds identically.

pr-validate.yml

The pull-request gate. On every PR it runs terraform fmt -check, terraform validate, tfsec, and conftest (OPA) policy checks, then a terraform plan using the plan role (read-only). It never holds mutate permission, so a PR can be validated safely from a fork or feature branch.

Promotion — dev to staging to prod

Promotion is enforced by GitHub Environments with required reviewers, layered on top of the OIDC apply-role trust (the apply role is only assumable inside the matching Environment):

EnvironmentApprovalTrigger
devAuto (no reviewers)push to main plans dev; apply on dispatch
staging1 reviewerDispatch terraform-pipeline.yml with env=staging
prod2 reviewers (4-eyes)Dispatch terraform-pipeline.yml with env=prod

Because the gate lives in the GitHub Environment, a prod apply physically cannot start minting the apply-role credential until two distinct reviewers approve the run.

alphaswarm_admin — two images, then a dispatch handoff

alphaswarm_admin is built and deployed slightly differently from the platform itself.

  1. A push to the admin repo's main (or a v* tag) builds two images and pushes them to ECR:
    • alphaswarm-admin (the FastAPI backend)
    • alphaswarm-admin-frontend (the Next.js frontend)
  2. After both images land, the admin workflow fires a cross-repo repository_dispatch event named admin-image-published at alphaswarm_platform.
  3. That dispatch triggers the platform's app-tier redeploy, which rolls the admin service onto ECS Fargate (Cognito + ALB) via the platform's terraform/environments/{dev,staging,prod} app tier (generalized from the existing minimum env).
  4. The app tier reads its infra handles from SSM under /alphaswarm/<env>/*, published by infrastructure/envs/admin-{dev,staging,prod}.

The cross-repo dispatch requires a token (PLATFORM_DISPATCH_TOKEN) configured as a secret in the admin repo — see the runbook for setup. For what the admin service itself is, see alphaswarm-admin.

See also