Architecture
Human entry point. Pair with the AI-agent entry point at AGENTS.md and the doc map at /intro.
Cold-start path: /intro/quickstart. Deployment path: how-to/operations/local-setup or how-to/operations/kubernetes-deploy.
AlphaSwarm is a local-first, agentic quantitative research and trading platform. Every LLM call, every backtest, every reinforcement-learning rollout, and every piece of metadata stays on local hardware — no proprietary alpha leaves the box. The codebase distills patterns from Microsoft Qlib, AI4Finance FinRL, QuantConnect Lean, OpenBB, vnpy, and TradingAgents into one coherent platform.
The platform is organised around four invariants that hold across every subsystem:
- Hash-locked spec runtimes.
AgentSpec,BotSpec,RLExperimentSpec, andAnalysisSpeceach have a single sanctioned executor (AgentRuntime/BotRuntime/RLRuntime/AnalysisRuntime). Any spec change creates a new immutable*_spec_versionsrow; old versions stay forever for replay. - Medallion lakehouse. Every Iceberg write goes through
iceberg_catalog.append_arrowwith a declared bronze / silver / gold layer; agents read throughdata.*MCP tools, never raw ORM. - One LLM gateway, one progress bus. Every model call routes
through
router_complete; every Celery task emits canonical progress frames throughalphaswarm.tasks._progress. - Topology is data, not code. Service URLs, MCP audiences, and
credential references resolve through
alphaswarm_platform/configs/deployment/topology.yaml.
System component diagram
Solid lines are default-profile data paths; dotted lines are opt-in / asynchronous.
The four edge surfaces
AlphaSwarm exposes four hostnames, each behind its own Cloudflare property:
alpha-swarm.ai— operator UI (alphaswarm_client). Vite + React 19 + Tailwind 4 + shadcn/ui. Routes the topbar KillSwitch, paper trading dashboards, RL Lab, Analysis Lab, Workflow Studio, Data Hub.api.alpha-swarm.ai— public API (alphaswarm/api). FastAPI gateway, 30+ route modules, Stripe-style date epochs (first epoch2026-06-01).manage.alpha-swarm.ai— control plane (alphaswarm_controller). Workload lifecycle, TerraformRuntime, IdP wiring. Never importsalphaswarm.*.docs.alpha-swarm.ai— documentation (alphaswarm_docs). Docusaurus 3 on Cloudflare Pages. Pages Functions for content-negotiation, sanitised page fragments, and the "Was this helpful?" feedback loop. Standalone MCP Worker at/mcp(RFC 9728 + 8707 compliant per AGENTS rule 49).
Plus two adjacent zones:
status.alpha-swarm.ai— Instatus status page. Separate Cloudflare zone so it stays up when the cluster is degraded.archive.alpha-swarm.ai— frozen Stripe-style API epochs after the 12-month sunset window.
Request lifecycle
Every spec-driven dispatch — backtest, agent run, RL training, analysis flow, workflow — follows the same canonical shape. The two new contracts since the prior version of this doc:
- Hash-lock first. Before any work happens, the runtime computes
the spec's SHA-256, looks for a matching
*_spec_versionsrow, inserts a new immutable row if the content changed. - Kill switch reachable. Every long-running runtime is in the
topbar KillSwitch
fan-out list. The runtime checks
should_halton every step.
The frame envelope is {task_id, stage, message, timestamp, **extras} per AGENTS rule 4. The should_halt check makes every
spec-runtime an immediate stop target for the topbar kill switch.
Repository map
The monorepo is organised by responsibility. Each top-level package
has its own AGENTS.md enforcing strict boundaries; cross-package
imports are blocked in CI.
| Package | Role | Owner | Public-surface contract |
|---|---|---|---|
| alphaswarm/ | Quant runtime — strategies, backtests, agents, RAG, Iceberg | platform-team | alphaswarm/api/main.py::create_app |
| alphaswarm_controller/ | Workload lifecycle + Terraform driver + provider adapters | platform-team | alphaswarm_controller/main.py::create_app; NEVER imports alphaswarm.* |
| alphaswarm_core/ | Shared value types, ABCs, auth/resource filters, topology | platform-team | Dependency-light; consumed by both alphaswarm/ and alphaswarm_controller/ |
| alphaswarm_client/ | Active Vite + React 19 operator UI at alpha-swarm.ai | platform-team | pnpm --filter alphaswarm_client dev |
| alphaswarm_ui/ | Cloud-hosted Next.js PaaS frontend (dual Auth0 + Entra) | platform-team | Never imports alphaswarm.* / alphaswarm_controller.* |
| alphaswarm_admin/ | Internal admin at manage.alpha-swarm.ai (audit-first) | platform-team | Mirrors alphaswarm_controller boundary |
| alphaswarm_rl/ | RL stack — RLExperimentSpec + RLRuntime + Iceberg trajectories | rl-team | Legacy alphaswarm.rl.* is a deprecation shim |
| alphaswarm_models/ | ML framework, custom model serving (vLLM + Ollama), AlphaBacktestExperiment | ml-team | Legacy alphaswarm.ml.* + alphaswarm/llm/{vllm_runner,ollama_client}.py are deprecation shims |
| alphaswarm_bots/ | Bot templates + BotRuntime (smallest deployable unit) | agentic-team | YAML at alphaswarm_bots/templates/{trading,research}/ |
| alphaswarm_ide/ | Theia 1.72 IDE + six AlphaSwarm extensions | platform-team | Canonical entrypoint: alphaswarm-cli ide |
| alphaswarm_cli/ | Standalone operator CLI (HTTP-only, device-flow auth) | platform-team | Never imports alphaswarm.* / alphaswarm_controller.* |
| alphaswarm_platform/ | Hosted-platform deployment + IaC + build assets | infra-team | No import alphaswarm.*; TerraformRuntime-only |
| alphaswarm_index/ | Curator-owned single source of truth | docs-team | Sole-writer is the alphaswarm-index-curator subagent |
| alphaswarm_docs/ | This site (Docusaurus 3 on Cloudflare Pages) | docs-team | Quality gates in .github/workflows/docs-ci.yml |
| alphaswarm_snippets/ | Curated knowledge + extractions + inspiration trees | docs-team | Runtime code MUST NOT import this tree |
Inside alphaswarm/ the subsystems map one-to-one to concept docs:
For the full canonical repository-split contract (boundaries, import guards, future extraction map) read repository-split. For the file-by-file path contract for cross-repo references read alphaswarm-monorepo-paths.
Hard rules (cardinal subset)
Every contributor reads the full 55 hard rules in AGENTS.md. The cardinal subset that surfaces in this doc:
- Rule 1.
Symbol.parse(vt_symbol)only. Never split avt_symbolon.. - Rule 2. All LLM calls go through
router_complete. - Rule 3. All Iceberg writes go through
iceberg_catalog.append_arrow. - Rule 4. All progress emits use the canonical frame envelope.
- Rule 5. All cross-task state goes through Postgres; never pickle ORM objects.
- Rule 12-19, 23-25, 40-41. The five spec runtimes
(
AgentRuntime,BotRuntime,RLRuntime,AnalysisRuntime,WorkflowRuntime) are the only sanctioned executors for their respective specs. Specs are immutable once committed; behaviour changes always create a new version row. - Rule 22. Agents NEVER read Postgres / Iceberg directly. Every
catalog / dataset / entity read goes through a registered
DataMCPTool. - Rule 42-45. TerraformRuntime owns all
terraform apply; WorkloadRuntime owns all runtime workload ops; both write to theworkload_runs+terraform_runsaudit ledgers before executing. - Rule 47. Service URLs resolve through the topology service; AlphaSwarm is cluster-agnostic.
- Rule 49. Every MCP server is RFC 9728 + 8707 conformant.
- Rule 52. Step-up MFA (RFC 9470) on every halt + every destructive surface.
Worked example: trace your first request
Goal: dispatch a backtest, watch the WebSocket frames, inspect the ledger row and the Iceberg gold output — without leaving this page.
Step 1 — dispatch
The example below targets your local compose stack at
http://localhost:8000. Hit "Run" to fire a sample momentum backtest.
Step 2 — tail the WebSocket
Switch to your terminal and tail the canonical progress frames:
curl -N http://localhost:8000/chat/stream/<task_id>
You will see frames in the {task_id, stage, message, timestamp, **extras} shape. Stages: start → bar.processed (×N) →
done (carries the final BacktestResult).
Step 3 — inspect the ledger
Pyodide can run this synchronous SQL via DuckDB against a small
parquet snapshot of backtest_runs:
When pointed at the real platform, replace the inline list with a /data/exports MCP call and the same SQL works against the actual ledger snapshot.
Step 4 — read the Iceberg gold output
from pyiceberg.catalog import load_catalog
cat = load_catalog("alphaswarm")
table = cat.load_table(f"alphaswarm_gold_backtests.run_{run_id}")
df = table.scan().to_pandas()
print(df[["timestamp", "equity", "drawdown"]].tail(10))
Step 5 — verify
- A
backtest_runsrow with non-NULLsharpeexists. - The WebSocket emitted a
stage=doneframe with the samerun_id. - An
alphaswarm_gold_backtests.run_<run_id>Iceberg table is queryable. - The
KillSwitchtopbar element shows a green status.
What next
- Run the full walkthrough in tutorials/first-backtest.
- Author a custom strategy: how-to/recipes/add-a-strategy.
- Promote the backtest to paper: how-to/recipes/promote-a-bot-to-paper.
- Replace the single-strategy dispatch with a multi-node workflow: tutorials/first-agent-workflow + concepts/agentic/workflow-studio.
Deployment modes
docker-compose (default)
docker compose up -d
Brings up redis, postgres, alphaswarm-core, alphaswarm-worker, alphaswarm-beat,
alphaswarm-client, chromadb, mlflow, otel-collector, jaeger. The
Iceberg catalog runs in PyIceberg SQL mode against the host bind
mount under data/iceberg/. Optional profiles:
--profile streaming— adds Redpanda + Flink for live market data.--profile vllm— adds a containerised vLLM inference server.--profile legacy— restores the older MinIO + iceberg-rest topology for rollback only.
Native dev (no Docker)
pip install -e ".[full,dev]"
alembic upgrade head
uvicorn alphaswarm.api.main:app --reload
celery -A alphaswarm.tasks.celery_app worker --loglevel=info
Kubernetes
make deploy-k8s ENV=prod
Manifests live under
alphaswarm_platform/deployments/kubernetes/.
The TerraformRuntime owns every terraform apply; see
how-to/operations/kubernetes-deploy
and how-to/operations/alphaswarm-fund-blue-green-cutover.
Cloudflare Pages (docs only)
docs.alpha-swarm.ai deploys via the
cloudflare_pages_docs
Terraform module — out of cluster, on the edge, behind Cloudflare
Access for /internal/* and /enterprise/*.
Where to start
| If you want to... | Read |
|---|---|
| Get the platform running locally | intro/quickstart |
| Understand the doc conventions | intro/conventions |
| See the canonical repository layout | repository-split |
| Run a backtest end-to-end | tutorials/first-backtest |
| Promote a bot from backtest to paper | tutorials/first-bot |
| Train an RL agent | tutorials/first-rl-experiment |
| Compose an agent workflow | tutorials/first-agent-workflow |
| Browse the API surface | reference/api |
| Browse the Python surface | reference/python |
| Inspect tables and columns | reference/data-dictionary |
| Author a new strategy | how-to/recipes/add-a-strategy |
| Query data without touching ORM | how-to/recipes/query-data-via-mcp |
| Snapshot an agent spec | how-to/recipes/snapshot-an-agent-spec |
| Trigger a kill switch | how-to/operations/kill-switch-incident-response |
| Deploy to Kubernetes | how-to/operations/kubernetes-deploy |
| Read the agentic-coding contract | concepts/agentic/agentic-development |
| Run docs from an AI agent | /llms.txt, /llms-full.txt, /mcp |
Deeper reads
- concepts/platform/repository-split — boundary
contract for every
alphaswarm_*package. - concepts/agentic/workflow-studio —
the
WorkflowRuntimeorchestration layer composing every spec runtime. - concepts/agentic/agentic-development — the spec-pattern mapped to the broader agentic-coding vocabulary.
- concepts/identity/management-engine —
WorkloadRuntime+ control-plane audit ledger. - concepts/infrastructure/terraform-control-plane —
TerraformRuntime+ hash-locked stack specs. - reference/api — Scalar-rendered API playground.
- reference/python — Griffe-generated Python reference.