Glossary

Project-specific jargon used across AlphaSwarm, with a definition and a pointer to the canonical file. New contributors and AI agents should treat this as the single source of truth for terminology — if you find a mismatch between this glossary and the code, file an issue.

See also: alphaswarm_docs/index.md for the full doc map.

Core domain

vt_symbol — Composite symbol id with the shape {TICKER}.{EXCHANGE} (vnpy convention), e.g. AAPL.NASDAQ, BTCUSDT.BINANCE, ESM4.CME. Always created via Symbol.parse(...) / Symbol.format(...) in alphaswarm/core/types.py; never hand-split.
Symbol — Immutable dataclass that bundles ticker, exchange, asset_class, security_type, optional contract spec. The atom flowing through every data feed, strategy, and broker. Defined in alphaswarm/core/types.py.
AssetClass vs SecurityType — AssetClass is the broad category (equity, crypto, fx, future, option, index, commodity, bond). SecurityType is the Lean-style finer-grained enum (equity, option, future_option, crypto_future, index_option, …). The _polymorphic_identity_for helper in alphaswarm/data/catalog.py maps SecurityType to a joined-table subclass of Instrument.
Resolution — Lean-style bar cadence (Tick, Second, Minute, Hour, Daily); see alphaswarm/core/types.py.
Interval — Short-code bar cadence (vnpy style, 1m, 5m, 1h, 1d). Same idea as Resolution, kept for vnpy back-compat.
SubscriptionDataConfig — The data-plane routing key. Combines Symbol + Resolution + TickType + DataNormalizationMode. See alphaswarm_docs/core-types.md.

Persistence + data plane

Execution Ledger — The Postgres tables under alphaswarm/persistence/models.py + alphaswarm/persistence/ledger.py that record every signal, order, fill, agent decision, and backtest run. Authoritative for "what did the system actually do?".
LedgerWriter — Façade over the ledger tables. Always go through it instead of writing to ORM models directly so audit messages get emitted. alphaswarm/persistence/ledger.py.
Instrument joined-table inheritance — instruments is the parent table; each subclass (InstrumentEquity, InstrumentOption, …) lives in its own joined-table row keyed on instruments.id. The instrument_class discriminator selects the subclass at load time. See alphaswarm_docs/erd.md and alphaswarm/persistence/models_instruments.py.
polymorphic_identity — SQLAlchemy mapper arg that ties a subclass to a discriminator value (e.g. InstrumentEquity.__mapper_args__ = {"polymorphic_identity": "spot"}). When you add a new instrument subclass you must also extend the mapping dict in _polymorphic_identity_for.
DatasetCatalog — Parent row describing a logical dataset (HMDA LAR, FDA device events, etc.) with provider/domain/tags.
DatasetVersion — Per-materialisation row beneath DatasetCatalog. Captures row count, dataset hash, schema snapshot, Iceberg identifier.
DataLink — Edge between a DatasetVersion and an entity (Instrument, Issuer, EconomicSeries). Use this for "which symbols does this dataset cover?" queries.
DataSource — Logical provider record (Yahoo, Alpha Vantage, IBKR, openFDA). Datasets and data-links reference a DataSource.
IcebergCatalog (the wrapper) — PyIceberg handle from alphaswarm/data/iceberg_catalog.py. Always go through append_arrow, read_arrow, iceberg_to_duckdb_view; never call PyIceberg's Catalog.create_table directly.
aqp_<source> namespace — Iceberg namespace convention for the regulatory ingest: alphaswarm_cfpb, alphaswarm_uspto, alphaswarm_fda, alphaswarm_sec. New corpora pick a new aqp_<source> slug.
Persistent host warehouse — C:/alphaswarm-warehouse on Windows, bind-mounted into alphaswarm-api and alphaswarm-worker at /warehouse. Holds the PyIceberg SQL catalog (catalog.db), Parquet data files, staging dir, and ingest audit logs. See alphaswarm_docs/data-catalog.md.
legacy profile — Docker Compose profile that bundles the older REST + MinIO catalog topology (off by default). Bring it up with docker compose --profile legacy up -d.

Strategies + backtest

BaseStrategy — Abstract strategy contract under alphaswarm/strategies/. Subclasses implement on_bar, on_signal, etc. See alphaswarm_docs/backtest-engines.md.
MLAlphaStrategy / MLSelectorAlpha — Strategies that wrap an ML model (deployed via ModelDeployment) and emit signals.
EnsembleAlpha — Weighted combination of multiple alphas. alphaswarm/strategies/ml_alphas.py.
IBrokerage / IDataQueueHandler — Lean-style interfaces consumed by backtest, paper, and live engines without modification (the same strategy code runs against all three). See alphaswarm_docs/paper-trading.md.
BacktestRun — Postgres row describing one backtest invocation (Sharpe, Sortino, drawdown, MLflow run id, dataset hash). The backtest UI's history view is just a query against this table.
MLflow run id — Foreign id stored on BacktestRun.mlflow_run_id pointing at the MLflow tracking server. Click-through from the UI opens the MLflow UI in a new tab.
dataset_hash — Deterministic SHA-256 of the input bars used in a backtest. Lets the UI flag "two backtests with the same hash = identical inputs".

ML + agents

Tier (deep / quick) — Two LLM tiers in the agentic crews. deep = high-capability (Nemotron 70B / GPT-4-class) for analysis; quick = small/fast (Llama 3.2 / Mini) for control-flow decisions. Provider per tier is in settings.llm_provider_deep / _quick; model per tier in llm_deep_model / llm_quick_model.
router_complete — One-shot LLM completion through LiteLLM exposed by alphaswarm/llm/providers/router.py. All AlphaSwarm code goes through this — never call litellm.completion or the Ollama client directly.
Director — Nemotron-driven planner + verifier in alphaswarm/data/pipelines/director.py. Sits between discovery and materialisation in generic file ingestion.
IngestionPlan / PlannedDataset — Director output dataclass. One PlannedDataset per discovered family with target namespace, table name, expected_min_rows, domain hint, and skip list.
VerifierVerdict — Director's post-materialise judgement (accept or retry with adjusted knobs).
__assets__ family — Synthetic DiscoveredDataset carrying the non-tabular inventory (PDFs, XML, images) found during discovery. Never materialised; surfaced under IngestionReport.extras for visibility.
AgentDecision / DebateTurn — Agent crew audit trail rows.
CrewRun — One full agentic crew invocation (planner → research → execution sub-agents).
Alpha158 — Microsoft Qlib's 158-feature factor zoo, ported to AlphaSwarm under alphaswarm/data/indicators_zoo.py.
FeatureSet / FeatureSetVersion — Composable feature spec (list of IndicatorZoo expressions + transformations) versioned in Postgres, materialised on demand.
ModelDeployment / MLDeployment — A trained ML model that has been registered for inference (rows in alphaswarm/persistence/models.py).

Bots

Bot — Smallest self-contained, deployable unit on AlphaSwarm. Aggregates a universe + data pipeline + strategy + backtest engine + optional ML deployments + optional agent specs + RAG plan + metrics
- risk caps + deployment target. Lives under a Project and is uniquely identified by (project_id, slug). See alphaswarm_docs/bots.md.
BotSpec — Pydantic blueprint for a bot. Hashed via snapshot_hash() to drive immutable bot_versions snapshots. Defined in alphaswarm/bots/spec.py.
TradingBot / ResearchBot — Bot subclasses selected by BotSpec.kind. TradingBot does backtest / paper / deploy; ResearchBot does chat (and optional backtest if a strategy block is set).
BotRuntime — Single sanctioned execution entry point for any bot lifecycle action. Snapshots specs into bot_versions, opens bot_deployments rows, and emits progress through alphaswarm/tasks/_progress.py.
bot_versions — Immutable, hash-locked spec snapshots (mirrors agent_spec_versions). Never mutated in place.
bot_deployments — Ledger of every backtest / paper / chat / k8s invocation for a bot. References the BotVersion that produced it so a run can be replayed.
Deployment target (paper_session / kubernetes / backtest_only) — Selected via BotSpec.deployment.target. Backed by alphaswarm/bots/deploy.py::DeploymentDispatcher.

Provider catalog

LLMProvider — Lightweight handle around a LiteLLM provider spec. Registered in alphaswarm/llm/providers/catalog.py::PROVIDERS.
ProviderSpec — Static config for a provider slug (LiteLLM prefix, env-var name, default models).
vllm provider — OpenAI-compatible vLLM endpoint behind LiteLLM's openai/ adapter. Empty ALPHASWARM_VLLM_BASE_URL disables.
nemotron-3-nano:30b — Default Director model on Ollama (NVIDIA Nemotron Nano v3, 31.6B params). Pull with ollama pull nemotron-3-nano:30b. Configurable via ALPHASWARM_LLM_DIRECTOR_MODEL.

Streaming + live

KafkaDataFeed — In-process Kafka consumer that hands bars/quotes to the IDataQueueHandler interface.
features.indicators.v1, market.bar.v1, … — Versioned Kafka topics. Naming pattern is <domain>.<entity>.v<n>.
StreamingIngester — alphaswarm-stream-ingest CLI that publishes to Kafka topics from Alpaca / IBKR.
Heartbeat / kill-switch — Periodic Redis publish from the paper- trading session; absence triggers the runner to halt. ALPHASWARM_RISK_KILL_SWITCH_KEY (default alphaswarm:kill_switch).

Observability

OTEL endpoint — ALPHASWARM_OTEL_ENDPOINT (default empty disables). When set, every Celery task and HTTP request emits OpenTelemetry spans via alphaswarm/observability/.
Progress bus — Redis pub/sub channel alphaswarm:task:<task_id> carrying {stage, message, timestamp, **extra} payloads. UIs subscribe via the WebSocket relay at /chat/stream/{task_id}. See alphaswarm/ws/broker.py and alphaswarm/tasks/_progress.py.

Configuration

settings — Cached Settings instance from alphaswarm/config.py. Always import as from alphaswarm.config import settings and never construct Settings() directly — the cache backs lru_cache(maxsize=1).
ALPHASWARM_* env namespace — Every settable knob takes the ALPHASWARM_ prefix. Bools accept true/false/1/0. Paths are resolved by _coerce_path.
host-downloads — /host-downloads:ro bind mount in alphaswarm_platform/compose/docker-compose.yml exposing the user's local Downloads/ directory for CLI ingest jobs.

Inspiration rehydration (Phase 2026-04-29)

Microprice — (P_ask * Q_bid + P_bid * Q_ask) / (Q_bid + Q_ask). Volume-weighted refinement of mid-price; converges to the deeper side of the book. Implemented in alphaswarm/data/microstructure.py.
OBI (Order Book Imbalance) — (Q_bid - Q_ask) / (Q_bid + Q_ask), range [-1, +1]. Positive = bid-side pressure. Used as a quote skew signal in the LOB market-making strategies under alphaswarm/strategies/hft/.
VPIN — Volume-synchronized probability of informed trading (Easley/López/O'Hara). Re-buckets trade flow by equal-volume buckets; rolling mean of |buy-sell|/|buy+sell|. See alphaswarm/data/microstructure.py.
Sample-aware Sharpe — Annualised Sharpe ratio that uses the actual sample frequency of a returns series instead of the assumed 252 trading days. Required for HFT strategies with sub-daily bars. See alphaswarm/backtest/hft_metrics.py.
Walk-forward — Training scheme where the model is re-fit on a rolling (or anchored) window and tested on the immediately following slice. Implemented in alphaswarm/ml/walk_forward.py.
Bachelier (Normal) model — Options pricing model assuming the underlying follows arithmetic Brownian motion (dF = sigma dW). Appropriate for low-priced or near-zero underlyings (rates, basis spreads). See alphaswarm/options/normal_model.py.
Inverse option — Option settled in the underlying asset (e.g. BTC) rather than quote currency (USD). Common on crypto venues like Deribit. See alphaswarm/options/inverse_options.py.
Regime classifier — Lightweight classifier that labels each bar as trending vs ranging using ADX threshold (default 25) or as bull/bear/neutral via multi-MA slope vote. See alphaswarm/data/regime.py.
Factor expression — Tiny Polars-based DSL covering Alpha101 primitives (Ts_Mean, Ts_Std, Rank, Decay_Linear, Delta, Ts_Corr). See alphaswarm/data/factor_expression.py.
Engle-Granger cointegration — Two-step test for cointegrated pairs: OLS hedge ratio + ADF test on the residual. See alphaswarm/data/cointegration.py.
Triple-barrier label — Lopez de Prado labeling: look forward horizon bars, label +1 if upper barrier hit first, -1 if lower, 0 if horizon reached. See alphaswarm/data/labels.py.
Yang-Zhang volatility — OHLC vol estimator combining overnight, open-to-close, and Rogers-Satchell components. The most efficient of the OHLC family. See alphaswarm/data/realised_volatility.py.
LobStrategy — ABC for limit-order-book strategies; subclasses emit OrderIntent lists in response to LobState updates. Engine integration is deferred — see alphaswarm_snippets/extractions/_FUTURE_PROMPTS/lob_adapter_prompt.md.
Dataset preset — Curated declarative spec for a one-click ingestion (e.g. intraday_momentum_etf, crypto_majors_intraday). See alphaswarm/data/dataset_presets.py.
Inspiration source — One of seven external repos under alphaswarm_snippets/inspiration/ from which strategies / models / agents were rehydrated. Tracked via the source kwarg on alphaswarm.core.registry.register and surfaced as the source:* tag.

Testing

tests/data/test_pipelines_smoke.py — Reference test for the Iceberg ingestion path. New ingest features should add a test in this directory.
director_enabled=False — Pass when constructing IngestionPipeline in tests so the real LLM is bypassed in favour of the deterministic identity plan.

Cross-repo

agentic_assistants — Sibling repo providing the cross-system lineage API (ALPHASWARM_AGENTIC_ASSISTANTS_API).
rpi_kubernetes — Sibling repo with the k8s deployment manifests under alphaswarm_platform/deploy/k8s/.

Core domain​

Persistence + data plane​

Strategies + backtest​

ML + agents​

Bots​

Provider catalog​

Streaming + live​

Observability​

Configuration​

Inspiration rehydration (Phase 2026-04-29)​

Testing​

Cross-repo​