Pair with alphaswarm_docs/erd.md (visual schema) and
alphaswarm_docs/domain-model.md (narrative).
Doc map: alphaswarm_docs/index.md.
Authoritative table-and-column reference for the AlphaSwarm persistence
layer plus the Iceberg catalog. Updated whenever a model file or
migration ships — see "Adding a new model"
for the workflow.
Conventions
- PK: primary key column.
- FK: foreign key (
→ table.column).
- Type: SQLAlchemy column type.
String(N) is VARCHAR(N) in
Postgres. JSON is JSONB. DateTime is timezone-naive UTC.
- Null:
Y/N. Defaults are listed where present.
- Notes: extra constraints, indexes, or invariants.
All id columns are String(36) UUIDs generated by _uuid() unless
noted. All created_at / updated_at columns default to
datetime.utcnow server-side.
1. Sessions + chat — models.py
sessions
The conversational shell that chat messages and agent runs live under.
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| user | String(120) | N | – | default "local" |
| title | String(240) | Y | – | – |
| created_at | DateTime | N | – | default now |
| closed_at | DateTime | Y | – | – |
| meta | JSON | Y | – | default {} |
chat_messages
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| session_id | UUID | N | → sessions.id | cascade delete |
| role | String(32) | N | – | `user |
| content | Text | N | – | – |
| created_at | DateTime | N | – | default now |
| meta | JSON | Y | – | default {} |
2. Strategies + backtests — models.py
strategies
The top-level strategy header. Versions live in strategy_versions.
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| name | String(120) | N | – | – |
| version | Integer | N | – | default 1 |
| config_yaml | Text | N | – | full YAML config |
| created_at | DateTime | N | – | default now |
| created_by | String(120) | N | – | default "system" |
| status | String(32) | N | – | `draft |
| meta | JSON | Y | – | default {} |
strategy_versions
Immutable YAML snapshot of a strategy.
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| strategy_id | UUID | N | → strategies.id | cascade |
| version | Integer | N | – | – |
| config_yaml | Text | N | – | – |
| author | String(120) | N | – | default "system" |
| created_at | DateTime | N | – | – |
| dataset_hash | String(64) | Y | – | bind to data version |
| notes | Text | Y | – | – |
Index: ix_strategy_versions_strategy_version (strategy_id, version).
strategy_tests
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| strategy_id | UUID | N | → strategies.id | – |
| version_id | UUID | Y | → strategy_versions.id | – |
| backtest_id | UUID | Y | → backtest_runs.id | – |
| status | String(32) | N | – | default pending |
| start, end | DateTime | Y | – | window |
| sharpe, sortino, max_drawdown, total_return, final_equity | Float | Y | – | summary metrics |
| engine | String(64) | Y | – | – |
backtest_runs
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| strategy_id | UUID | Y | → strategies.id | – |
| task_id | String(120) | Y | – | Celery task id |
| status | String(32) | N | – | default pending |
| start, end | DateTime | Y | – | window |
| initial_cash, final_equity | Float | Y | – | – |
| sharpe, sortino, max_drawdown, total_return | Float | Y | – | metrics |
| mlflow_run_id | String(120) | Y | – | links to MLflow UI |
| dataset_hash | String(64) | Y | – | – |
| metrics | JSON | Y | – | full metrics blob |
| error | Text | Y | – | – |
| model_version_id | UUID | Y | → model_versions.id | Alembic 0025 — model that produced the alpha |
| ml_experiment_run_id | UUID | Y | → ml_experiment_runs.id | Alembic 0025 — training run lineage |
| experiment_plan_id | UUID | Y | → experiment_plans.id | Alembic 0025 — experiment plan lineage |
| model_deployment_id | UUID | Y | → model_deployments.id | Alembic 0025 — deployment that wired the model into the strategy |
ml_alpha_backtest_runs
Combined experiment row joining a training run to a downstream
backtest. Persisted by AlphaBacktestExperiment.
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| task_id | String(120) | Y | – | Celery task id |
| run_name | String(240) | N | – | default alpha-backtest |
| status | String(32) | N | – | `queued |
| ml_experiment_run_id | UUID | Y | → ml_experiment_runs.id | – |
| backtest_run_id | UUID | Y | → backtest_runs.id | – |
| model_version_id | UUID | Y | → model_versions.id | – |
| model_deployment_id | UUID | Y | → model_deployments.id | – |
| experiment_plan_id | UUID | Y | → experiment_plans.id | – |
| mlflow_run_id | String(120) | Y | – | parent MLflow run id |
| dataset_hash | String(64) | Y | – | – |
| ml_metrics | JSON | Y | – | IC / RMSE / hit-rate / etc |
| trading_metrics | JSON | Y | – | Sharpe / Sortino / Calmar / etc |
| combined_metrics | JSON | Y | – | rolled-up scalar score + selected ML/trading keys |
| attribution | JSON | Y | – | conviction-vs-PnL attribution |
| params | JSON | Y | – | full input-config snapshot |
| error | Text | Y | – | – |
ml_prediction_audit
Per-bar prediction audit for an alpha-backtest run. Opt-in via
ALPHASWARM_ML_PREDICTION_AUDIT_ENABLED; capped at
ALPHASWARM_ML_PREDICTION_AUDIT_MAX_ROWS rows per run.
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| alpha_backtest_run_id | UUID | N | → ml_alpha_backtest_runs.id | cascade |
| vt_symbol | String(40) | N | – | – |
| ts | DateTime | N | – | – |
| prediction | Float | N | – | – |
| label | Float | Y | – | – |
| position_after | Float | Y | – | – |
| pnl_after_bar | Float | Y | – | – |
optimization_runs
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| task_id | String(120) | Y | – | – |
| strategy_id | UUID | Y | → strategies.id | – |
| run_name | String(240) | N | – | default "sweep" |
| method | String(32) | N | – | `grid |
| metric | String(64) | N | – | default "sharpe" |
| status | String(32) | N | – | `queued |
| n_trials, n_completed | Integer | N | – | – |
| best_trial_id | String(36) | Y | – | – |
| best_metric_value | Float | Y | – | – |
| parameter_space, base_config, summary | JSON | Y | – | – |
optimization_trials
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| run_id | UUID | N | → optimization_runs.id | cascade |
| backtest_id | UUID | Y | → backtest_runs.id | – |
| trial_index | Integer | N | – | – |
| parameters | JSON | Y | – | – |
| status | String(32) | N | – | – |
| metric_value, sharpe, sortino, total_return, max_drawdown, final_equity | Float | Y | – | – |
| error | Text | Y | – | – |
signals
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| strategy_id | UUID | Y | → strategies.id | – |
| backtest_id | UUID | Y | → backtest_runs.id | – |
| vt_symbol | String(40) | N | – | indexed |
| direction | String(10) | N | – | `long |
| strength | Float | N | – | – |
| confidence | Float | Y | – | default 1.0 |
| rationale | Text | Y | – | – |
Composite index: ix_signals_symbol_ts (vt_symbol, created_at).
orders
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| backtest_id | UUID | Y | → backtest_runs.id | – |
| strategy_id | UUID | Y | → strategies.id | – |
| vt_symbol | String(40) | N | – | indexed |
| side | String(8) | N | – | `buy |
| order_type | String(16) | N | – | `market |
| quantity, price | Float | varies | – | – |
| status | String(16) | N | – | default submitting |
| reference | String(120) | Y | – | paper:<run_id> for paper trading |
fills
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| order_id | UUID | Y | → orders.id | – |
| vt_symbol | String(40) | N | – | indexed |
| side | String(8) | N | – | – |
| quantity, price | Float | N | – | – |
| commission, slippage | Float | Y | – | default 0 |
ledger_entries
The canonical audit trail. Every action goes through LedgerWriter.
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| backtest_id | UUID | Y | → backtest_runs.id | – |
| strategy_id | UUID | Y | → strategies.id | – |
| entry_type | String(32) | N | – | `SIGNAL |
| level | String(16) | N | – | `debug |
| message | Text | N | – | – |
| payload | JSON | Y | – | default {} |
Index: ix_ledger_type_ts (entry_type, created_at).
instruments (parent)
The polymorphic root. instrument_class is the discriminator;
subclass rows live in instrument_<class> tables keyed on
instruments.id.
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| vt_symbol | String(64) | N | unique | Symbol.format() |
| ticker | String(64) | N | indexed | – |
| exchange | String(32) | Y | – | – |
| asset_class | String(32) | Y | – | `equity |
| security_type | String(32) | Y | – | `equity |
| instrument_class | String(32) | Y | indexed | discriminator |
| issuer_id | UUID | Y | → issuers.id | – |
| identifiers | JSON | Y | – | {ticker, isin, cusip, …} |
| sector, industry, region, currency | String | Y | – | – |
| tick_size, multiplier, min_quantity, max_quantity, lot_size | Float | Y | – | exchange specs |
| price_precision, size_precision | Integer | Y | – | – |
| is_active | Boolean | N | – | default true |
| tags | JSON | Y | – | default [] |
| meta | JSON | Y | – | default {} |
Joined-table subclasses (instrument_<class>)
All share id PK that's also a FK to instruments.id. Each table
adds shape-specific columns. For full column lists see
models_instruments.py; the
ERD in alphaswarm_docs/erd.md lists key columns per
subclass.
| Subclass table | Polymorphic identity | Distinctive columns |
|---|
instrument_equity | spot | isin, cusip, figi, lei, gics_sector, shares_outstanding, is_adr |
instrument_etf | etf | inception_date, aum, expense_ratio, is_leveraged, replication |
instrument_index | index | administrator, methodology, constituent_count, base_value |
instrument_bond | bond | coupon, maturity, rating_sp, rating_moodys, callable, convertible |
instrument_future | future | underlying, expiry, contract_size, cycle, delivery_month |
instrument_option | option | strike, expiry, kind (call/put), style, occ_symbol |
instrument_fx_pair | fx_pair | base_currency, quote_currency, pip_size |
instrument_crypto | crypto_token | subtype, chain, contract_address, max_leverage, funding_interval |
instrument_cfd | cfd | underlying, margin_rate, financing_rate |
instrument_commodity | spot_commodity | grade, unit_of_measure, delivery |
instrument_synthetic | synthetic | legs, leg_weights, formula |
instrument_betting | betting | event_name, market_type, selection_id |
instrument_tokenized_asset | nft | chain, contract_address, token_standard |
5. Dataset lineage — models.py
dataset_catalogs
Logical dataset descriptor. Iceberg-related columns added in
migration 0011.
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| name | String(160) | N | indexed | – |
| provider | String(80) | N | indexed | – |
| domain | String(120) | N | – | default "market.bars" |
| frequency | String(32) | Y | – | – |
| storage_uri | String(512) | Y | – | – |
| schema_json | JSON | Y | – | – |
| description | Text | Y | – | – |
| tags | JSON | Y | – | – |
| meta | JSON | Y | – | – |
| iceberg_identifier | String(240) | Y | indexed | <ns>.<table> |
| load_mode | String(32) | N | – | `managed |
| source_uri | String(1024) | Y | – | – |
| llm_annotations | JSON | Y | – | from annotate_table |
| column_docs | JSON | Y | – | – |
Composite index: ix_dataset_catalog_name_provider.
dataset_versions
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| catalog_id | UUID | N | → dataset_catalogs.id (cascade) | – |
| version | Integer | N | – | default 1 |
| status | String(32) | N | – | `active |
| as_of, start_time, end_time | DateTime | Y | – | – |
| row_count, symbol_count, file_count | Integer | N | – | default 0 |
| dataset_hash | String(64) | Y | indexed | SHA-256 of inputs |
| materialization_uri | String(512) | Y | – | – |
| columns | JSON | Y | – | – |
| schema_json | JSON | Y | – | – |
| meta | JSON | Y | – | – |
Composite index: ix_dataset_versions_catalog_version.
data_sources
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| name | String | N | unique | `yfinance |
| kind | String | Y | – | `rest |
| base_url | String | Y | – | – |
| meta | JSON | Y | – | – |
data_links
Edges between dataset versions and entities (instruments, series).
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| dataset_version_id | UUID | N | → dataset_versions.id (cascade) | – |
| source_id | UUID | Y | → data_sources.id | – |
| instrument_id | UUID | Y | → instruments.id | – |
| entity_kind | String | N | – | `instrument |
| entity_id | String | N | – | – |
| coverage_start, coverage_end | DateTime | Y | – | – |
| row_count | Integer | Y | – | – |
| meta | JSON | Y | – | – |
identifier_links
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| instrument_id | UUID | Y | → instruments.id | – |
| source_id | UUID | Y | → data_sources.id | – |
| identifier_kind | String | N | – | `cik |
| identifier_value | String | N | – | – |
split_plans, split_artifacts, pipeline_recipes, experiment_plans, model_versions, model_deployments
ML lineage tables. See full column lists in
models.py (search for the class
name). One-liner summary:
| Table | Purpose |
|---|
split_plans | Train/val/test split design (method, segments, FK to dataset_version) |
split_artifacts | Materialised fold boundaries + index sets per split plan |
pipeline_recipes | Preprocessing recipes (shared/learn/infer processors) |
experiment_plans | Ties dataset_version + split + recipe + model config + status |
model_versions | One row per trained MLflow registry version |
model_deployments | Active inference deployments (one model_version may have many) |
agent_runs
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| session_id | UUID | Y | → sessions.id | – |
| task_id | String(120) | Y | indexed | – |
| crew | String(120) | N | – | – |
| status | String(32) | N | – | – |
| prompt | Text | N | – | – |
| result | JSON | Y | – | – |
| error | Text | Y | – | – |
| llm_model | String(120) | Y | – | – |
| token_usage | JSON | Y | – | – |
crew_runs
Lightweight index for the Crew Trace UI.
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| task_id | String(120) | N | unique | – |
| crew_name | String(120) | N | – | default "research" |
| crew_type | String(32) | N | indexed | `research |
| status | String(32) | N | indexed | – |
| prompt | Text | N | – | – |
| session_id | String(36) | Y | indexed | – |
| agent_run_id | UUID | Y | → agent_runs.id | – |
| result, events | JSON | Y | – | – |
| error | Text | Y | – | – |
| cost_usd | Float | N | – | default 0 |
agent_decisions, debate_turns, agent_backtests, agent_judge_reports, agent_replay_runs, backtest_interrupts
The agentic-backtest audit trail.
| Table | Purpose |
|---|
agent_decisions | One row per long/short/flat decision (links backtest, strategy, crew_run) |
debate_turns | Multi-turn debate transcripts under a decision |
agent_backtests | Crew-level metrics rolled up per backtest |
agent_judge_reports | Judge LLM's evaluation of a backtest |
agent_replay_runs | Replays of a judged backtest with adjusted prompts |
backtest_interrupts | User pause/resume markers during a long backtest |
feature_sets
| Column | Type | Null | FK | Notes |
|---|
| id | UUID | N | – | PK |
| name | String | N | – | – |
| description | Text | Y | – | – |
| kind | String | N | – | `composite |
| specs | JSON | N | – | list of indicator/transformation strings |
| tags | JSON | Y | – | – |
| default_lookback_days | Integer | Y | – | – |
feature_set_versions
Immutable snapshot keyed on content_hash so the same spec rendered
twice deduplicates.
feature_set_usages
Records of which backtests / deployments consumed which feature-set
versions (for reverse lineage).
8. Reports + paper — models.py
| Table | Purpose | Key columns |
|---|
equity_reports | Markdown equity research reports generated by the report-writer crew | vt_symbol, cohort, markdown, cost_usd |
paper_trading_runs | One row per paper or live session | brokerage, feed, last_heartbeat_at, bars_seen, orders_submitted, fills, state |
rl_episodes | Snapshot of an RL training episode | run_id, episode, mean_reward, portfolio_value |
Tables introduced by the Bot Entity Refactor (Alembic
0020_bots). Mirror the proven
agent_specs / agent_spec_versions / agent_runs_v2 pattern.
| Table | Purpose | Key columns |
|---|
bots | Logical bot row (latest active version of a named spec inside a project) | id, name, slug, kind (`trading |
bot_versions | Immutable, hash-locked snapshot of every BotSpec change | id, bot_id FK, version, spec_hash, payload, notes, created_by, (bot_id, spec_hash) UNIQUE, (bot_id, version) UNIQUE |
bot_deployments | One row per backtest / paper / chat / k8s invocation; references the version that produced it | id, bot_id FK, version_id FK, target (`paper_session |
All three tables carry ProjectScopedMixin (owner_user_id,
workspace_id, project_id).
| Table | Key columns |
|---|
news_items | url, source, published_at, headline, body |
news_item_entities | news_item_id, vt_symbol, entity_kind (`instrument |
news_sentiments | news_item_id, scorer (`finbert |
corporate_events is the parent; the per-type tables FK back to it.
| Table | Key columns |
|---|
corporate_events | vt_symbol, event_type (`earnings |
earnings_event_rows | event_id, eps_actual, eps_estimate, revenue_actual |
dividend_event_rows | event_id, amount, ex_date, pay_date |
split_event_rows | event_id, ratio |
ipo_event_rows | event_id, offer_price, shares_offered |
merger_event_rows | event_id, acquirer, target, terms |
calendar_event_rows | event_id, event_kind, expected_time |
analyst_estimates | vt_symbol, analyst, target_price, forecast_date |
price_targets | vt_symbol, analyst, target_price, period |
forward_estimates | vt_symbol, analyst, metric, value |
regulatory_event_rows | event_id, regulator, summary |
esg_event_rows | event_id, category, score |
| Table | Key columns |
|---|
financial_statements | issuer_id, period (`Q |
financial_ratios | issuer_id, period_end, pe, pb, roe, roa, debt_to_equity |
key_metrics | issuer_id, period_end, revenue, net_income, free_cash_flow |
historical_market_caps | issuer_id, as_of, market_cap |
revenue_breakdowns | issuer_id, period_end, segment, region, revenue |
earnings_call_transcripts | issuer_id, call_date, content |
management_discussion_analysis | issuer_id, period_end, mda_text |
reported_financials | issuer_id, period_end, xbrl_payload |
| Table | Key columns |
|---|
economic_series | series_id (FRED:GDP), title, frequency, units, source |
economic_observations | series_id, observation_date, value |
cot_reports | report_date, instrument, positions |
bls_series | series_id, title, frequency |
treasury_rates | date, rate_3m, rate_2y, rate_10y, rate_30y |
yield_curves | date, tenors |
option_series | instrument_id, expiry, style |
option_chain_snapshots | series_id, as_of, chain_payload |
futures_curves | as_of, front_month, tenor_prices |
market_holidays | exchange, date, name |
market_status_history | exchange, as_of, status |
| Table | Key columns |
|---|
issuers | name, lei, country, entity_kind |
government_entities | id (PK_FK), country_code, level |
funds | id (PK_FK), fund_family, fund_type |
sectors | code, name |
industries | code, name, sector_id |
industry_classifications | issuer_id, industry_id, as_of |
entity_relationships | parent_id, child_id, kind |
locations | issuer_id, country, city |
key_executives | issuer_id, name, title |
executive_compensation | executive_id, year, total_comp |
insider_transactions | vt_symbol, insider_name, transaction_date, quantity |
institutional_holdings | vt_symbol, holder_name, as_of, quantity |
form_13f_holdings | filer_cik, vt_symbol, period_end |
short_interest | vt_symbol, settlement_date, short_interest |
shares_float_snapshots | vt_symbol, as_of, float_shares |
politician_trades | politician, vt_symbol, trade_date, amount |
fund_holdings | fund_id, vt_symbol, as_of, position |
| Table | Key columns |
|---|
taxonomy_schemes | name (`GICS |
taxonomy_nodes | scheme_id, parent_id, code, label |
entity_tags | node_id, entity_kind, entity_id |
entity_crosswalks | from_kind, from_id, to_kind, to_id |
15. External-source indexes — models.py
| Table | Purpose | Key columns |
|---|
fred_series | FRED metadata index | series_id, title, units, frequency |
sec_filings | SEC EDGAR filing index | instrument_id, accession, form, filing_date |
gdelt_mentions | GDelt GKG mention index | instrument_id, mention_time, gkg_payload |
Iceberg namespace conventions
Iceberg tables sit alongside the Postgres schema; their identifiers
are stored in dataset_catalogs.iceberg_identifier for cross-lookup.
| Namespace | Source |
|---|
alphaswarm | Generic / default (fallback when no --namespace provided) |
alphaswarm_smoke | Smoke-test namespace (scripts/iceberg_smoke.py) |
alphaswarm_cfpb | CFPB regulatory ingest |
alphaswarm_uspto | USPTO regulatory ingest |
alphaswarm_fda | openFDA regulatory ingest |
alphaswarm_sec | SEC quarterly data sets |
alphaswarm_bars | (reserved) generic OHLCV cache |
alphaswarm_features | (reserved) feature-set materialisations |
Naming rules:
- Namespace:
aqp_<source>, lower-snake-case, ≤32 chars.
- Table: lower-snake-case, ≤48 chars, descriptive nouns
(
hmda_lar, device_event, broker_dealers).
- The Director (Nemotron) decides the final table name within these
rules; identity-plan fallback uses the discovered family name as-is.
Layout:
C:/alphaswarm-warehouse/iceberg/
├── catalog.db # SQLite metadata
└── <namespace>/
└── <table>/
├── data/00000-0-<uuid>.parquet
├── data/00001-0-<uuid>.parquet
└── metadata/
├── 00000-<uuid>.metadata.json
├── 00001-<uuid>.metadata.json
├── <uuid>-m0.avro # manifest list
└── snap-<snap_id>-...avro # snapshot
Snapshots are append-only — every append_arrow produces a new
metadata.json revision. Old snapshots can be expired with
PyIceberg's Table.expire_snapshots(...) (not exposed via API yet).
Updating this dictionary:
When you add an ORM column or a new table:
- Update the corresponding section above.
- If you added a table to a per-domain ERD scope, update
alphaswarm_docs/erd.md too.
- Cross-link the migration that introduced the change.