alphaswarm-executor
Celery heavy-compute executor pod — the compute-heavy counterpart of
the orchestration alphaswarm-worker.
Introduced by the Phase 4c worker/executor split. It carries the full ML / RL / forecasting / portfolio + distributed-compute (Dask + Ray) dependency surface so backtests, training rollouts, factor builds, and agent-emitted strategy code run here instead of bloating the slim orchestration worker. See worker vs executor images for the full rationale and dependency matrix.
Identity
| Field | Value |
|---|---|
| Service id | alphaswarm-executor |
| Role | executor |
| Package | alphaswarm/ (tasks under alphaswarm/tasks/*.py) |
| Image (key) | executor |
| Built from | alphaswarm_platform/Dockerfile (target executor, multi-arch) or the standalone build/docker/alphaswarm_executor/Dockerfile |
Wire
| Field | Value |
|---|---|
| Protocol | none (no HTTP listener) |
| Health | celery inspect ping + Prometheus metrics on :9100; Ray dashboard on :8265 when a local Ray head runs |
| Public URL | — |
| Broker | redis://redis:6379/0 |
| Result backend | redis://redis:6379/1 |
Deployment surfaces
| Surface | Where |
|---|---|
| Compose | alphaswarm-executor in deployments/compose/docker-compose.local.yml; worker-gpu in legacy compose/docker-compose.yml |
| Kustomize | deployments/kubernetes/base/alphaswarm-executor/ — Deployment + HPA + PDB |
| Image catalogue | executor entry in terraform/modules/alphaswarm_images/ |
| ECR repo | alphaswarm-executor in infrastructure/modules/ecr-repositories/ |
| Terraform module | terraform/modules/faas/ — heavy-queue Deployments pull this image |
| Topology | alphaswarm-executor in configs/deployment/topology.yaml |
Queue families
The executor drains the heavy compute queues. KEDA scales each queue family independently.
| Queue | Drives | Scale-to-zero | Notes |
|---|---|---|---|
backtest | backtest dispatch (vbt-pro / event-driven / Lean) | yes | max=20 |
training | RL rollouts, finetune jobs | yes | dedicated GPU node group |
ml | ML pipelines, predictor refresh | yes | |
agents | CrewAI runs, LangGraph orchestration | yes | max=12 |
factors | factor zoo builds, alpha tests | yes | |
rag | RAG ingest, embedding refresh | yes |
Dependencies
Upstream:
redis— broker + result backend.postgres— task lookups, ledger writes.alphaswarm-core— progress emit callbacks, lookup APIs.mlflow— experiment tracking + model registry for training / ML runs.- All data-plane services the
alphaswarm-corepod depends on.
Downstream:
- Beat schedules heavy periodic jobs (factor refresh, predictor retraining); the executor is the consumer.
- May start a local Ray head / Dask cluster for distributed backtests.
Operations
- Resources: requests
1 CPU / 4Gi, limits8 CPU / 16Gi. Prefers memory-optimized nodes via node affinity; anti-affinity keeps it off thealphaswarm-corenodes. - Scaling: HPA on CPU + custom Celery queue depth (KEDA
ScaledObjects supersede it where KEDA is installed). Scales down slowly (900s stabilization) so a long-running backtest / train job is not evicted mid-flight. - Concurrency: 2 per pod (compute-bound; each task is heavy).
- Drain on shutdown:
terminationGracePeriodSeconds: 600so in-flight jobs complete;preStopsendsSIGTERMto Celery. - Audit:
WorkloadRuntimeactions landworkload_runsrows; the executor pod respects the kill-switch Redis key like every other pod.
See also
alphaswarm-worker.md— orchestration sibling (light queues).worker-executor-images.md— image split rationale + dependency matrix.faasTerraform module — KEDA scaling source of truth.build/docker/alphaswarm_executor/— standalone image (migration-ready).