Saltar al contenido principal

ADR 005 — Separated alphaswarm_controller/ micro-project

  • Status: Accepted (2026-05-18)
  • Authors: Platform team
  • Supersedes: Embeds in alphaswarm/api/routes/control_plane.py
  • Related: ADR 002, ADR 003, ADR 004

Context

The in-flight alphaswarm/api/routes/control_plane.py exposes deploy / destroy / restart / logs endpoints to the Vite Control Plane UI. It already covers the "local k3d" and "rpi_kubernetes" targets and delegates mutating ops to TerraformRuntime via Celery tasks (see alphaswarm/api/routes/control_plane.py).

The refactor wants the control plane to:

  1. Speak five backends (docker_compose, kubernetes, AWS, Azure, GCP) — not just two Terraform stacks.
  2. Be deployable on its own (/deployments/compose/docker-compose.admin.yml, isolated alphaswarm-admin Docker network) so an operator can run "just the control plane" against a remote cluster.
  3. Be releasable independently from the AlphaSwarm monolith (different cadence, different SLOs).
  4. Have a security boundary that doesn't bleed in if alphaswarm itself is compromised — and vice versa.

The strict-isolation reading of the prompt's hard constraint ("Never import alphaswarm.* modules inside alphaswarm_controller/") plus the existing alphaswarm/ codebase yields three integration patterns:

  1. Strict separation — duplicate every model, validator, and adapter into alphaswarm_controller/. 2x code, fully independent release.
  2. Shared lower-level library — extract reusable bits (Pydantic topology models, JWT validator, K8s adapter ABCs, credential protocol) into a NEW alphaswarm_core/ package both alphaswarm/ and alphaswarm_controller/ depend on. No alphaswarm.* imports in CP, but shared lower-level code stays DRY.
  3. Evolve in place — keep control plane in alphaswarm/; just add the alphaswarm_client container + Auth0 RBAC.

Decision

Adopt pattern 2 — the shared-library approach.

  1. New top-level package alphaswarm_core/ is created with its own pyproject.toml (installable as alphaswarm-core).
  2. Move (with back-compat re-exports from alphaswarm/) the following into alphaswarm_core/:
    • topology/ — Pydantic models from alphaswarm/deployment/topology.py (data classes only; loaders stay in alphaswarm/).
    • auth/ — Auth0 JWT validator from alphaswarm/auth/providers/auth0.py + alphaswarm/api/security.py's claim validation + new resource_filter.py (ADR 003).
    • kubernetes/KubernetesAdapter ABC from alphaswarm/kubernetes/protocol.py. Concrete adapters (InClusterAdapter, LocalComposeAdapter, RpiClusterAdapter) stay in alphaswarm/.
    • credentials/SecretStore protocol + CredentialResolver interface. Concrete stores stay in alphaswarm/.
    • connectivity/ — NEW ConnectivityConfig Pydantic settings model with ALPHASWARM_*_URL matrix.
    • models/DeploymentSpec, DeploymentStatus, MetricPoint, NodeHealth (referenced by both alphaswarm.api.routes.control_plane and the new alphaswarm_controller.api.routers).
  3. The alphaswarm_controller/ micro-project (own pyproject.toml) depends ONLY on alphaswarm-core. It never imports alphaswarm.*.
  4. alphaswarm/ keeps the runtimes, ledger writers, registry implementations, and concrete adapters. It also depends on alphaswarm-core (just like alphaswarm_controller/).
  5. Back-compat shims in alphaswarm/deployment/, alphaswarm/auth/, alphaswarm/kubernetes/, alphaswarm/credentials/ re-export from alphaswarm_core so no existing import paths break and no other AlphaSwarm module needs to change in this PR.

The strict-isolation enforcement is a CI lint:

# .github/workflows/ci.yml step
rg --type python "^from alphaswarm(\.|$)|^import alphaswarm(\.|$)" alphaswarm_controller/ \
&& echo "FAIL: alphaswarm_controller imports forbidden alphaswarm.* module" && exit 1

Consequences

Positive

  • alphaswarm_controller ships as a standalone OCI image with no AlphaSwarm runtime dependency. Operators running multiple AlphaSwarm tenants share one control plane.
  • The shared lib is small (~2 kloc) and changes infrequently. When it does change, both alphaswarm/ and alphaswarm_controller/ re-pin and re-test — explicit coupling.
  • The existing alphaswarm/api/routes/control_plane.py becomes a thin proxy that calls the external alphaswarm_controller when the env var ALPHASWARM_CP_REMOTE=1 is set, or talks in-process to the same modules when disabled. Backward compat for local dev.
  • AGENTS hard rules 27 (IdentityProvider), 28 (KubernetesAdapter) still apply — the metaclass registries live in alphaswarm_core/auth/ and alphaswarm_core/kubernetes/, with concrete impls registered from alphaswarm/ and alphaswarm_controller/ alike.

Negative

  • Adds one more package to publish and version. Mitigated by treating alphaswarm-core as an internal dependency pinned to a git SHA from a monorepo — no PyPI release needed.
  • Cross-package refactors now need to touch two pyproject.toml files. Acceptable cost; the boundary is intentional.
  • The "embed vs separate" decision is now load-bearing for security — a vulnerability in alphaswarm_core/auth/ lands in both planes. Reviewed in ce-security-sentinel agent runs (see .cursor/agents/).

Alternatives considered

  • Strict separation (pattern 1) — rejected. Duplicate code rots out of sync; security fixes have to land twice; impossible to keep JWT validator semantics identical between the two planes.
  • Evolve in place (pattern 3) — rejected. The biggest gap the prompt closes is deployment independence and the 5-backend abstraction. Both demand a separate process; in-place is just a renamed router.
  • gRPC contract between the two — rejected for now. The two planes share Pydantic models and HTTP/JSON is already understood. gRPC adds proto-gen tooling burden without buying anything until we hit hundreds of req/s of internal calls.

Decision tree: which side does new code go on?

When adding a new feature, ask:

  1. Is this a workload runtime operation (start, stop, scale, exec, logs, telemetry)? → alphaswarm_controller/
  2. Is this an IaC provisioning operation (create cluster, register Auth0 tenant, apply RBAC)? → alphaswarm/terraform/
  3. Is this AlphaSwarm business logic (agents, RL, bots, analysis, backtests)? → alphaswarm/
  4. Is this a shared model, validator, or ABC that BOTH need? → alphaswarm_core/

If unsure, prefer alphaswarm/ and revisit the boundary once the requirement is clearer.

Implementation references

  • Shared lib: alphaswarm_core/ (this PR)
  • Micro-project: alphaswarm_controller/ (this PR)
  • Strict-isolation lint: .github/workflows/ci.yml (Phase 8)
  • Existing in-AlphaSwarm control plane: alphaswarm/api/routes/control_plane.py
  • Existing topology: alphaswarm/deployment/topology.py
  • AGENTS rules 27, 28, 42, 45 — boundary owners