Worker vs executor images
The AlphaSwarm Celery surface is split into two purpose-built, migration-ready container images (Phase 4c):
alphaswarm-worker— slim orchestration worker. Task dispatch, lineage, paper-trading loop, terraform/ingestion/workflow coordination.alphaswarm-executor— heavy-compute executor. Backtests, RL / ML training, factor builds, agent-emitted strategy code, RAG ingest.
Why split
Historically worker and beat had no image of their own — the
alphaswarm_images catalogue pinned worker = { target = "api" }, so the
orchestration worker dragged the entire API stage (Dash, visualization,
dev tooling) plus the full ML/RL surface into one fat image. Two
problems followed:
- Bloat & blast radius. A lineage callback worker carried PyTorch, Ray, vectorbt-pro, forecasting libs — slow to pull, large attack surface, slow cold-start.
- Scaling mismatch. Light coordination tasks (sub-second, IO-bound) and heavy compute tasks (minutes–hours, CPU/GPU/RAM-bound) have opposite scaling and resource profiles, but shared one Deployment.
Splitting lets each image carry only what its queues need, and lets each scale and be resourced independently.
Queue ↔ image matrix
The queue assignment is identical across the root Dockerfile, the
standalone per-service Dockerfiles, the K8s manifests, both compose
files, and the faas KEDA module (local.heavy_queues). A queue is
never drained by both images.
| Queue | Image | Why |
|---|---|---|
default | worker | bookkeeping, lineage, callbacks |
paper | worker | sub-second paper-trading loop (latency-sensitive) |
terraform | worker | TerraformRuntime apply/destroy wrappers |
ingestion | worker | connector pulls (IO-bound, long-lived) |
workflows | worker | WorkflowRuntime orchestration |
backtest | executor | vbt-pro / event-driven / Lean engine runs |
training | executor | RL rollouts + finetune jobs (GPU) |
ml | executor | ML pipelines, predictor refresh |
agents | executor | CrewAI / LangGraph agent runs |
factors | executor | factor-zoo builds, alpha tests |
rag | executor | RAG ingest, embedding refresh |
Dependency surface
Both images share the multi-arch (linux/amd64+arm64) Chainguard Wolfi
base, nonroot UID 65532, and the CredentialResolver-only secret rule
(nothing baked into the image). They differ only in installed extras:
| worker | executor | |
|---|---|---|
| Base extras | otel, cli, iceberg, entity-graph, dagster-alphaswarm | same |
| Distributed compute | compute-dask, compute-ray | compute-dask, compute-ray |
| ML / RL / forecasting | — | ml, ml-torch, ml-forecast, ml-anomaly |
| Portfolio | — | portfolio |
| Native build deps | — | gfortran, linux-headers (numpy/scipy/forecast wheels) |
| Extra dirs | /app/data | /app/data, /app/models |
| Default concurrency | 4 | 2 |
| Resource requests | 500m CPU / 1Gi | 1 CPU / 4Gi |
| Resource limits | 4 CPU / 8Gi | 8 CPU / 16Gi |
Where the images are defined
| Surface | Worker | Executor |
|---|---|---|
| Root multi-stage target | worker in Dockerfile | executor in Dockerfile |
| Standalone Dockerfile | build/docker/alphaswarm_worker/ | build/docker/alphaswarm_executor/ |
| Image catalogue | worker / beat → target worker | executor → target executor |
| ECR repo | alphaswarm-worker | alphaswarm-executor |
| Kustomize base | base/alphaswarm-worker/ | base/alphaswarm-executor/ |
| Compose | worker (legacy) / alphaswarm-worker | worker-gpu (legacy) / alphaswarm-executor |
Migration readiness
The two images are intentionally self-contained — a standalone Dockerfile, its own ECR repo, its own image-catalogue entry, its own Kustomize base, and its own topology entry — so the build assets can be lifted into a dedicated repository in a future migration without untangling them from the API image.
See also
alphaswarm-worker— orchestration worker service doc.alphaswarm-executor— heavy-compute executor service doc.services.md— full service catalogue.faasTerraform module — KEDA per-queue scaling.