Adding a new InfrastructureProvider to alphaswarm_controller
Step-by-step guide for shipping a new InfrastructureProvider implementation (AGENTS rule 45 / ADR 004).
When to add a new provider
Add a provider when you need to manage workloads on a backend the existing five (docker_compose, kubernetes, aws, azure, gcp) don't cover. Examples: Nomad, Fly.io, Render, on-prem VMs via Salt/Ansible.
Checklist
1. Sketch the credential chain
What env vars does the backend's SDK read? How does it discover credentials in CI vs on a developer laptop vs in production?
Document this in your provider's _check_credentials() so the health probe can fail loudly when credentials are missing.
2. Create the provider module
# alphaswarm_controller/src/alphaswarm_controller/providers/<name>.py
from alphaswarm_core.providers.protocol import (
InfrastructureProvider,
InfrastructureProviderError,
InfrastructureProviderUnavailable,
ProviderKind,
)
from alphaswarm_core.providers.registry import register_provider_class
@register_provider_class("<alias>", replace=True)
class MyProvider(InfrastructureProvider):
provider_kind = ProviderKind.<NEW_KIND>
provider_alias = "<alias>"
async def health(self) -> ProviderHealth: ...
async def start(self, spec: DeploymentSpec) -> DeploymentStatus: ...
async def stop(self, service_id: str, *, namespace=None) -> DeploymentStatus: ...
async def scale(self, service_id, replicas, *, namespace=None) -> DeploymentStatus: ...
async def status(self, service_id: str, *, namespace=None) -> DeploymentStatus: ...
async def list_deployments(self, *, namespace=None) -> list[DeploymentStatus]: ...
# Optional — override if your backend supports it.
async def get_config(self, service_id: str, *, namespace=None) -> ServiceConfig: ...
async def apply_config(self, patch: ConfigMapPatch) -> bool: ...
async def stream_metrics(self, service_id, *, namespace=None, interval_seconds=10.0): ...
3. Add a new ProviderKind
# alphaswarm_core/src/alphaswarm_core/providers/protocol.py
class ProviderKind(str, Enum):
DOCKER_COMPOSE = "docker_compose"
KUBERNETES = "kubernetes"
AWS = "aws"
AZURE = "azure"
GCP = "gcp"
NOMAD = "nomad" # <-- your new kind
4. Register in the bootstrap helper
# alphaswarm_controller/src/alphaswarm_controller/providers/__init__.py
for module_name in (
"alphaswarm_controller.providers.docker_compose",
"alphaswarm_controller.providers.kubernetes",
"alphaswarm_controller.providers.aws",
"alphaswarm_controller.providers.azure",
"alphaswarm_controller.providers.gcp",
"alphaswarm_controller.providers.<name>", # <-- add yours
):
...
5. Optional deps go in pyproject.toml extras
[project.optional-dependencies]
<name> = ["sdk-package>=X,<Y"]
all-providers = [
"alphaswarm-controller[docker_compose,kubernetes,aws,azure,gcp,<name>]",
]
6. Write contract tests
Two test files:
# alphaswarm_controller/tests/providers/test_<name>.py — unit tests over the
# translation helpers (e.g. spec_to_<backend>, response_to_status)
# alphaswarm_controller/tests/providers/test_<name>_integration.py — full
# contract test against a mocked SDK (moto for AWS, MagicMock for others)
Reuse the assertion patterns in test_docker_compose.py and test_kubernetes.py.
7. Update the bootstrap registry test
# tests/providers/test_registry.py
def test_bootstrap_registers_all() -> None:
registry = bootstrap()
for expected in ("docker_compose", "kubernetes", "aws", "azure", "gcp", "<name>"):
assert expected in registry.aliases()
8. Update the README + this runbook
Add your provider to the table in alphaswarm_controller/README.md and the per-cloud sections below.
Per-cloud setup notes
AWS
- Active provider:
ALPHASWARM_CP_PROVIDER=aws - Credentials: standard boto3 chain (env vars /
~/.aws/credentials/ EC2 / EKS pod identity / WebIdentity) - IAM minimum:
ecs:DescribeServices,ecs:UpdateService,ssm:GetParameter*,ssm:PutParameter*, plus EKS read perms when using the K8s sub-path
Azure
- Active provider:
ALPHASWARM_CP_PROVIDER=azure - Credentials:
azure-identitychain (env vars / Managed Identity / federated identity / Azure CLI) - IAM minimum: Contributor on the AKS / Container Instances resource group
GCP
- Active provider:
ALPHASWARM_CP_PROVIDER=gcp - Credentials:
GOOGLE_APPLICATION_CREDENTIALSenv var pointing to a service account JSON, OR Workload Identity (preferred in production) - IAM minimum:
run.developer,container.developer,secretmanager.admin(per project)
Definition of done
- Provider class registered +
provider_kindmatches alias - All seven abstract methods implemented (or raise
InfrastructureProviderUnavailablewith a structured message) - Credential probe in
health()returns a useful error when creds are missing - Unit tests + contract tests passing
-
tests/providers/test_registry.pyupdated - README + this runbook updated
- CI matrix builds + tests with the new optional dep group