Architecture

Human entry point. Pair with the AI-agent entry point at AGENTS.md and the doc map at /intro.

Cold-start path: /intro/quickstart. Deployment path: how-to/operations/local-setup or how-to/operations/kubernetes-deploy.

AlphaSwarm is a local-first, agentic quantitative research and trading platform. Every LLM call, every backtest, every reinforcement-learning rollout, and every piece of metadata stays on local hardware â€” no proprietary alpha leaves the box. The codebase distills patterns from Microsoft Qlib, AI4Finance FinRL, QuantConnect Lean, OpenBB, vnpy, and TradingAgents into one coherent platform.

The platform is organised around four invariants that hold across every subsystem:

Hash-locked spec runtimes. AgentSpec, BotSpec, RLExperimentSpec, and AnalysisSpec each have a single sanctioned executor (AgentRuntime / BotRuntime / RLRuntime / AnalysisRuntime). Any spec change creates a new immutable *_spec_versions row; old versions stay forever for replay.
Medallion lakehouse. Every Iceberg write goes through iceberg_catalog.append_arrow with a declared bronze / silver / gold layer; agents read through data.* MCP tools, never raw ORM.
One LLM gateway, one progress bus. Every model call routes through router_complete; every Celery task emits canonical progress frames through alphaswarm.tasks._progress.
Topology is data, not code. Service URLs, MCP audiences, and credential references resolve through alphaswarm_platform/configs/deployment/topology.yaml.

System component diagram

Solid lines are default-profile data paths; dotted lines are opt-in / asynchronous.

Current alpha topology

AlphaSwarm is in alpha testing. The active local test shape is:

MacBook: client-side apps, local dev servers, and operator tooling.
Local Ubuntu box: hosted platform services and cluster workloads.
Demo components: not part of the current stack and should not be running.

The hosted alpha surfaces are:

alpha-swarm.ai — operator UI (alphaswarm_client). Vite + React 19 + Tailwind 4 + shadcn/ui. Routes the topbar KillSwitch, paper trading dashboards, RL Lab, Analysis Lab, Workflow Studio, Data Hub.
alpha.alpha-swarm.ai — alpha-testing UI (alphaswarm_ui). Next.js 14+ hosted PaaS frontend behind Cloudflare Access for the current alpha cohort. It runs against the active platform, not demo mode.
api.alpha-swarm.ai — public API (alphaswarm/api). FastAPI gateway, 30+ route modules, Stripe-style date epochs (first epoch 2026-06-01).
manage.alpha-swarm.ai — control plane (alphaswarm_controller). Workload lifecycle, TerraformRuntime, IdP wiring. Never imports alphaswarm.*.
docs.alpha-swarm.ai — documentation (alphaswarm_docs). Docusaurus 3 on Cloudflare Pages. Pages Functions for content-negotiation, sanitised page fragments, and the "Was this helpful?" feedback loop. Standalone MCP Worker at /mcp (RFC 9728 + 8707 compliant per AGENTS rule 49).

Plus two adjacent zones:

status.alpha-swarm.ai â€” Instatus status page. Separate Cloudflare zone so it stays up when the cluster is degraded.
archive.alpha-swarm.ai â€” frozen Stripe-style API epochs after the 12-month sunset window.

Request lifecycle

Every spec-driven dispatch â€” backtest, agent run, RL training, analysis flow, workflow â€” follows the same canonical shape. The two new contracts since the prior version of this doc:

Hash-lock first. Before any work happens, the runtime computes the spec's SHA-256, looks for a matching *_spec_versions row, inserts a new immutable row if the content changed.
Kill switch reachable. Every long-running runtime is in the topbar KillSwitch fan-out list. The runtime checks should_halt on every step.

The frame envelope is {task_id, stage, message, timestamp, **extras} per AGENTS rule 4. The should_halt check makes every spec-runtime an immediate stop target for the topbar kill switch.

Repository map

The monorepo is organised by responsibility. Each top-level package has its own AGENTS.md enforcing strict boundaries; cross-package imports are blocked in CI.

Package	Role	Owner	Public-surface contract
alphaswarm/	Quant runtime â€” strategies, backtests, agents, RAG, Iceberg	`platform-team`	alphaswarm/api/main.py::create_app
alphaswarm_controller/	Workload lifecycle + Terraform driver + provider adapters	`platform-team`	alphaswarm_controller/main.py::create_app; NEVER imports `alphaswarm.*`
alphaswarm_core/	Shared value types, ABCs, auth/resource filters, topology	`platform-team`	Dependency-light; consumed by both `alphaswarm/` and `alphaswarm_controller/`
alphaswarm_client/	Active Vite + React 19 operator UI at `alpha-swarm.ai`	`platform-team`	`pnpm --filter alphaswarm_client dev`
alphaswarm_ui/	Cloud-hosted Next.js PaaS frontend (dual Auth0 + Entra)	`platform-team`	Never imports `alphaswarm.` / `alphaswarm_controller.`
alphaswarm_admin/	Internal admin at `manage.alpha-swarm.ai` (audit-first)	`platform-team`	Mirrors `alphaswarm_controller` boundary
alphaswarm_rl/	RL stack â€” `RLExperimentSpec` + `RLRuntime` + Iceberg trajectories	`rl-team`	Legacy `alphaswarm.rl.*` is a deprecation shim
alphaswarm_models/	ML framework, custom model serving (vLLM + Ollama), AlphaBacktestExperiment	`ml-team`	Legacy `alphaswarm.ml.*` + `alphaswarm/llm/{vllm_runner,ollama_client}.py` are deprecation shims
alphaswarm_bots/	Bot templates + `BotRuntime` (smallest deployable unit)	`agentic-team`	YAML at `alphaswarm_bots/templates/{trading,research}/`
alphaswarm_ide/	Theia 1.72 IDE + six AlphaSwarm extensions	`platform-team`	Canonical entrypoint: `alphaswarm-cli ide`
alphaswarm_cli/	Standalone operator CLI (HTTP-only, device-flow auth)	`platform-team`	Never imports `alphaswarm.` / `alphaswarm_controller.`
alphaswarm_platform/	Hosted-platform deployment + IaC + build assets	`infra-team`	No `import alphaswarm.*`; `TerraformRuntime`-only
alphaswarm_index/	Curator-owned single source of truth	`docs-team`	Sole-writer is the `alphaswarm-index-curator` subagent
alphaswarm_docs/	This site (Docusaurus 3 on Cloudflare Pages)	`docs-team`	Quality gates in .github/workflows/docs-ci.yml
alphaswarm_snippets/	Curated knowledge + extractions + inspiration trees	`docs-team`	Runtime code MUST NOT import this tree

Inside alphaswarm/ the subsystems map one-to-one to concept docs:

`alphaswarm/<package>/`	Doc
agents/	agentic-pipeline, agents, workflow-studio, multi-agent-patterns
analysis/	analysis-framework, analysis-lab, analysis-flows
api/	reference/api (auto-generated)
backtest/	backtest-engines, vbtpro-integration, hft-backtest
cli/	connector control plane
codebase/	codebase-mcp
core/	core-types
data/	knowledge base, connector control plane, pgvector
llm/	providers, sera
persistence/	domain-model, erd, reference/data-dictionary
providers/	knowledge base
risk/	paper-trading
streaming/	streaming, live-market
tasks/	agent-watchdog
trading/	paper-trading, paper-metadata-gate
ws/	observability
ui/	Deprecated (legacy Solara) â€” rollback only

For the full canonical repository-split contract (boundaries, import guards, future extraction map) read repository-split. For the file-by-file path contract for cross-repo references read alphaswarm-monorepo-paths.

Hard rules (cardinal subset)

Every contributor reads the full 55 hard rules in AGENTS.md. The cardinal subset that surfaces in this doc:

Rule 1. Symbol.parse(vt_symbol) only. Never split a vt_symbol on ..
Rule 2. All LLM calls go through router_complete.
Rule 3. All Iceberg writes go through iceberg_catalog.append_arrow.
Rule 4. All progress emits use the canonical frame envelope.
Rule 5. All cross-task state goes through Postgres; never pickle ORM objects.
Rule 12-19, 23-25, 40-41. The five spec runtimes (AgentRuntime, BotRuntime, RLRuntime, AnalysisRuntime, WorkflowRuntime) are the only sanctioned executors for their respective specs. Specs are immutable once committed; behaviour changes always create a new version row.
Rule 22. Agents NEVER read Postgres / Iceberg directly. Every catalog / dataset / entity read goes through a registered DataMCPTool.
Rule 42-45. TerraformRuntime owns all terraform apply; WorkloadRuntime owns all runtime workload ops; both write to the workload_runs + terraform_runs audit ledgers before executing.
Rule 47. Service URLs resolve through the topology service; AlphaSwarm is cluster-agnostic.
Rule 49. Every MCP server is RFC 9728 + 8707 conformant.
Rule 52. Step-up MFA (RFC 9470) on every halt + every destructive surface.

Worked example: trace your first request

Goal: dispatch a backtest, watch the WebSocket frames, inspect the ledger row and the Iceberg gold output â€” without leaving this page.

Step 1 â€” dispatch

The example below targets your local compose stack at http://localhost:8000. Hit "Run" to fire a sample momentum backtest.

Step 2 â€” tail the WebSocket

Switch to your terminal and tail the canonical progress frames:

curl -N http://localhost:8000/chat/stream/<task_id>

You will see frames in the {task_id, stage, message, timestamp, **extras} shape. Stages: start â†’ bar.processed (Ã—N) â†’ done (carries the final BacktestResult).

Step 3 â€” inspect the ledger

Pyodide can run this synchronous SQL via DuckDB against a small parquet snapshot of backtest_runs:

When pointed at the real platform, replace the inline list with a /data/exports MCP call and the same SQL works against the actual ledger snapshot.

Step 4 â€” read the Iceberg gold output

from pyiceberg.catalog import load_catalog
cat = load_catalog("alphaswarm")
table = cat.load_table(f"alphaswarm_gold_backtests.run_{run_id}")
df = table.scan().to_pandas()
print(df[["timestamp", "equity", "drawdown"]].tail(10))

Step 5 â€” verify

A backtest_runs row with non-NULL sharpe exists.
The WebSocket emitted a stage=done frame with the same run_id.
An alphaswarm_gold_backtests.run_<run_id> Iceberg table is queryable.
The KillSwitch topbar element shows a green status.

What next

Run the full walkthrough in tutorials/first-backtest.
Author a custom strategy: how-to/recipes/add-a-strategy.
Promote the backtest to paper: how-to/recipes/promote-a-bot-to-paper.
Replace the single-strategy dispatch with a multi-node workflow: tutorials/first-agent-workflow + concepts/agentic/workflow-studio.

Deployment modes

docker-compose (default)

docker compose up -d

Brings up redis, postgres, alphaswarm-core, alphaswarm-worker, alphaswarm-beat, alphaswarm-client, chromadb, mlflow, otel-collector, jaeger. The Iceberg catalog runs in PyIceberg SQL mode against the host bind mount under data/iceberg/. Optional profiles:

--profile streaming â€” adds Redpanda + Flink for live market data.
--profile vllm â€” adds a containerised vLLM inference server.
--profile legacy â€” restores the older MinIO + iceberg-rest topology for rollback only.

Native dev (no Docker)

pip install -e ".[full,dev]"
alembic upgrade head
uvicorn alphaswarm.api.main:app --reload
celery -A alphaswarm.tasks.celery_app worker --loglevel=info

Kubernetes

make deploy-k8s ENV=prod

Manifests live under alphaswarm_platform/deployments/kubernetes/. The TerraformRuntime owns every terraform apply; see how-to/operations/kubernetes-deploy and how-to/operations/alphaswarm-fund-blue-green-cutover.

Cloudflare Pages (docs only)

docs.alpha-swarm.ai deploys via the cloudflare_pages_docs Terraform module â€” out of cluster, on the edge, behind Cloudflare Access for /internal/* and /enterprise/*.

Where to start

If you want to...	Read
Get the platform running locally	intro/quickstart
Understand the doc conventions	intro/conventions
See the canonical repository layout	repository-split
Run a backtest end-to-end	tutorials/first-backtest
Promote a bot from backtest to paper	tutorials/first-bot
Train an RL agent	tutorials/first-rl-experiment
Compose an agent workflow	tutorials/first-agent-workflow
Browse the API surface	reference/api
Browse the Python surface	reference/python
Inspect tables and columns	reference/data-dictionary
Author a new strategy	how-to/recipes/add-a-strategy
Query data without touching ORM	how-to/recipes/query-data-via-mcp
Snapshot an agent spec	how-to/recipes/snapshot-an-agent-spec
Trigger a kill switch	how-to/operations/kill-switch-incident-response
Deploy to Kubernetes	how-to/operations/kubernetes-deploy
Read the agentic-coding contract	concepts/agentic/agentic-development
Run docs from an AI agent	`/llms.txt`, `/llms-full.txt`, `/mcp`

Deeper reads

concepts/platform/repository-split â€” boundary contract for every alphaswarm_* package.
concepts/agentic/workflow-studio â€” the WorkflowRuntime orchestration layer composing every spec runtime.
concepts/agentic/agentic-development â€” the spec-pattern mapped to the broader agentic-coding vocabulary.
concepts/identity/management-engine â€” WorkloadRuntime + control-plane audit ledger.
concepts/infrastructure/terraform-control-plane â€” TerraformRuntime + hash-locked stack specs.
reference/api â€” Scalar-rendered API playground.
reference/python â€” Griffe-generated Python reference.

System component diagram​

Current alpha topology​

Request lifecycle​

Repository map​

Hard rules (cardinal subset)​

Worked example: trace your first request​

Step 1 â€” dispatch​

Step 2 â€” tail the WebSocket​

Step 3 â€” inspect the ledger​

Step 4 â€” read the Iceberg gold output​

Step 5 â€” verify​

What next​

Deployment modes​

docker-compose (default)​

Native dev (no Docker)​

Kubernetes​

Cloudflare Pages (docs only)​

Where to start​

Deeper reads​