Skip to main content

Entity Relationship Diagram

Pair with alphaswarm_docs/data-dictionary.md (column-level detail) and alphaswarm_docs/domain-model.md (narrative). Doc map: alphaswarm_docs/index.md.

The Postgres schema has ~110 ORM classes spread across 11 model files under alphaswarm/persistence/. One mega-ERD would be unreadable, so this doc breaks the schema into focused diagrams by domain. The final section is a global FK-only map showing only the cross-domain joins.

Each per-domain ERD lists table names with the primary key (PK) and a short subset of columns. For full column lists, see data-dictionary.md.

Global FK map

Cross-domain edges only — pick a starting table and trace where it fans out.

Core / Instruments

Joined-table inheritance. Every concrete instrument subclass shares the parent instruments row and adds shape-specific columns in its own table keyed on instruments.id. The discriminator is instruments.instrument_class.

Market data lineage + Iceberg catalog

How AlphaSwarm tracks every dataset that flows into Iceberg. The iceberg_identifier column on dataset_catalogs was added in alembic/versions/0011_iceberg_catalog_columns.py.

Agentic + ML

Strategies, backtests, agent crews, ML deployments, and feature sets.

Ledger (signals / orders / fills / entries)

Every signal, order, fill, and free-form audit entry written by LedgerWriter.

News / Events / Fundamentals

Macro / FRED / GDelt

Entities / Issuers / Ownership

Taxonomy

Free-form tagging for issuers, instruments, and themes.

Sessions / Chat / Optimization

The conversational + experimentation layer.

Bots

Tables introduced by the Bot Entity Refactor (Alembic 0020_bots).

  • (project_id, slug) is unique on bots.
  • (bot_id, spec_hash) is unique on bot_versions (immutable snapshots).
  • bot_deployments.target is one of paper_session / kubernetes / backtest_only / chat / backtest.

Tables introduced by the Data Pipelines Hub work (Alembic 0024_data_layer_expansion). All four tables use ProjectScopedMixin.

Notes:

  • (project_id, name) is unique on sinks and market_data_producers.
  • (sink_id, spec_hash) and (sink_id, version) are unique on sink_versions (mirrors the bot_versions pattern).
  • (dataset_catalog_id, kind, target_ref, direction) is unique on streaming_dataset_links so the refresh_links task can be re-run idempotently.

ML alpha-backtest linkage (Alembic 0025)

The four new FKs on backtest_runs (added by Alembic 0025) close the loop from a backtest result back to the trained model that produced its alpha:

  • model_version_id — the registered ModelVersion row.
  • ml_experiment_run_id — the MLExperimentRun that trained it.
  • experiment_plan_id — the ExperimentPlan lineage row.
  • model_deployment_id — the ModelDeployment used to wire the model into the strategy via DeployedModelAlpha.

Adding a new model

When you add a new ORM class:

  1. Add the class to the appropriate alphaswarm/persistence/models_*.py (or models.py for cross-domain things).
  2. Add an Alembic migration (alembic revision --autogenerate -m "add foo"). Never edit a shipped migration.
  3. Update alphaswarm_docs/data-dictionary.md with the new table's columns.
  4. Add the table to the relevant per-domain ERD above (or open a new one if it's a new domain).
  5. If it has FKs into other domains, add those edges to the global FK map at the top of this file.