alphaswarm-executor

Catalog date: 2026-06-24.

Celery heavy-compute executor pod — the compute-heavy counterpart of the orchestration alphaswarm-worker.

Introduced by the Phase 4c worker/executor split. It carries the full ML / RL / forecasting / portfolio + distributed-compute (Dask + Ray) dependency surface so backtests, training rollouts, factor builds, and agent-emitted strategy code run here instead of bloating the slim orchestration worker. See worker vs executor images for the full rationale and dependency matrix.

Identity

Field	Value
Service id	`alphaswarm-executor`
Role	`executor`
Package	`alphaswarm/` (tasks under `alphaswarm/tasks/*.py`)
Image (key)	`executor`
Built from	`alphaswarm_platform/Dockerfile` (target `executor`, multi-arch) or the standalone `build/docker/alphaswarm_executor/Dockerfile`

Wire

Field	Value
Protocol	none (no HTTP listener)
Health	`celery inspect ping` + Prometheus metrics on `:9100`; Ray dashboard on `:8265` when a local Ray head runs
Public URL	—
Broker	`redis://redis:6379/0`
Result backend	`redis://redis:6379/1`

Deployment surfaces

Surface	Where
Compose	`alphaswarm-executor` in `deployments/compose/docker-compose.local.yml`; `worker-gpu` in legacy `compose/docker-compose.yml`
Kustomize	`deployments/kubernetes/base/alphaswarm-executor/` — Deployment + HPA + PDB
Image catalogue	`executor` entry in `terraform/modules/alphaswarm_images/`
ECR repo	`alphaswarm-executor` in `infrastructure/modules/ecr-repositories/`
Terraform module	`terraform/modules/faas/` — heavy-queue Deployments pull this image
Topology	`alphaswarm-executor` in `configs/deployment/topology.yaml`

Queue families

The executor drains the heavy compute queues. KEDA scales each queue family independently.

Queue	Drives	Scale-to-zero	Notes
`backtest`	backtest dispatch (vbt-pro / event-driven / Lean)	yes	`max=20`
`training`	RL rollouts, finetune jobs	yes	dedicated GPU node group
`ml`	ML pipelines, predictor refresh	yes
`agents`	CrewAI runs, LangGraph orchestration	yes	`max=12`
`factors`	factor zoo builds, alpha tests	yes
`rag`	RAG ingest, embedding refresh	yes

Dependencies

Upstream:

redis — broker + result backend.
postgres — task lookups, ledger writes.
alphaswarm-core — progress emit callbacks, lookup APIs.
mlflow — experiment tracking + model registry for training / ML runs.
All data-plane services the alphaswarm-core pod depends on.

Downstream:

Beat schedules heavy periodic jobs (factor refresh, predictor retraining); the executor is the consumer.
May start a local Ray head / Dask cluster for distributed backtests.

Operations

Resources: requests 1 CPU / 4Gi, limits 8 CPU / 16Gi. Prefers memory-optimized nodes via node affinity; anti-affinity keeps it off the alphaswarm-core nodes.
Scaling: HPA on CPU + custom Celery queue depth (KEDA ScaledObjects supersede it where KEDA is installed). Scales down slowly (900s stabilization) so a long-running backtest / train job is not evicted mid-flight.
Concurrency: 2 per pod (compute-bound; each task is heavy).
Drain on shutdown: terminationGracePeriodSeconds: 600 so in-flight jobs complete; preStop sends SIGTERM to Celery.
Audit: WorkloadRuntime actions land workload_runs rows; the executor pod respects the kill-switch Redis key like every other pod.

Identity​

Wire​

Deployment surfaces​

Queue families​

Dependencies​

Operations​

See also​