Skip to main content

Data Dictionary

Pair with alphaswarm_docs/erd.md (visual schema) and alphaswarm_docs/domain-model.md (narrative). Doc map: alphaswarm_docs/index.md.

Authoritative table-and-column reference for the AlphaSwarm persistence layer plus the Iceberg catalog. Updated whenever a model file or migration ships — see "Adding a new model" for the workflow.

Conventions

  • PK: primary key column.
  • FK: foreign key (→ table.column).
  • Type: SQLAlchemy column type. String(N) is VARCHAR(N) in Postgres. JSON is JSONB. DateTime is timezone-naive UTC.
  • Null: Y/N. Defaults are listed where present.
  • Notes: extra constraints, indexes, or invariants.

All id columns are String(36) UUIDs generated by _uuid() unless noted. All created_at / updated_at columns default to datetime.utcnow server-side.


1. Sessions + chat — models.py

sessions

The conversational shell that chat messages and agent runs live under.

ColumnTypeNullFKNotes
idUUIDNPK
userString(120)Ndefault "local"
titleString(240)Y
created_atDateTimeNdefault now
closed_atDateTimeY
metaJSONYdefault {}

chat_messages

ColumnTypeNullFKNotes
idUUIDNPK
session_idUUIDN→ sessions.idcascade delete
roleString(32)N`user
contentTextN
created_atDateTimeNdefault now
metaJSONYdefault {}

2. Strategies + backtests — models.py

strategies

The top-level strategy header. Versions live in strategy_versions.

ColumnTypeNullFKNotes
idUUIDNPK
nameString(120)N
versionIntegerNdefault 1
config_yamlTextNfull YAML config
created_atDateTimeNdefault now
created_byString(120)Ndefault "system"
statusString(32)N`draft
metaJSONYdefault {}

strategy_versions

Immutable YAML snapshot of a strategy.

ColumnTypeNullFKNotes
idUUIDNPK
strategy_idUUIDN→ strategies.idcascade
versionIntegerN
config_yamlTextN
authorString(120)Ndefault "system"
created_atDateTimeN
dataset_hashString(64)Ybind to data version
notesTextY

Index: ix_strategy_versions_strategy_version (strategy_id, version).

strategy_tests

ColumnTypeNullFKNotes
idUUIDNPK
strategy_idUUIDN→ strategies.id
version_idUUIDY→ strategy_versions.id
backtest_idUUIDY→ backtest_runs.id
statusString(32)Ndefault pending
start, endDateTimeYwindow
sharpe, sortino, max_drawdown, total_return, final_equityFloatYsummary metrics
engineString(64)Y

backtest_runs

ColumnTypeNullFKNotes
idUUIDNPK
strategy_idUUIDY→ strategies.id
task_idString(120)YCelery task id
statusString(32)Ndefault pending
start, endDateTimeYwindow
initial_cash, final_equityFloatY
sharpe, sortino, max_drawdown, total_returnFloatYmetrics
mlflow_run_idString(120)Ylinks to MLflow UI
dataset_hashString(64)Y
metricsJSONYfull metrics blob
errorTextY
model_version_idUUIDY→ model_versions.idAlembic 0025 — model that produced the alpha
ml_experiment_run_idUUIDY→ ml_experiment_runs.idAlembic 0025 — training run lineage
experiment_plan_idUUIDY→ experiment_plans.idAlembic 0025 — experiment plan lineage
model_deployment_idUUIDY→ model_deployments.idAlembic 0025 — deployment that wired the model into the strategy

ml_alpha_backtest_runs

Combined experiment row joining a training run to a downstream backtest. Persisted by AlphaBacktestExperiment.

ColumnTypeNullFKNotes
idUUIDNPK
task_idString(120)YCelery task id
run_nameString(240)Ndefault alpha-backtest
statusString(32)N`queued
ml_experiment_run_idUUIDY→ ml_experiment_runs.id
backtest_run_idUUIDY→ backtest_runs.id
model_version_idUUIDY→ model_versions.id
model_deployment_idUUIDY→ model_deployments.id
experiment_plan_idUUIDY→ experiment_plans.id
mlflow_run_idString(120)Yparent MLflow run id
dataset_hashString(64)Y
ml_metricsJSONYIC / RMSE / hit-rate / etc
trading_metricsJSONYSharpe / Sortino / Calmar / etc
combined_metricsJSONYrolled-up scalar score + selected ML/trading keys
attributionJSONYconviction-vs-PnL attribution
paramsJSONYfull input-config snapshot
errorTextY

ml_prediction_audit

Per-bar prediction audit for an alpha-backtest run. Opt-in via ALPHASWARM_ML_PREDICTION_AUDIT_ENABLED; capped at ALPHASWARM_ML_PREDICTION_AUDIT_MAX_ROWS rows per run.

ColumnTypeNullFKNotes
idUUIDNPK
alpha_backtest_run_idUUIDN→ ml_alpha_backtest_runs.idcascade
vt_symbolString(40)N
tsDateTimeN
predictionFloatN
labelFloatY
position_afterFloatY
pnl_after_barFloatY

optimization_runs

ColumnTypeNullFKNotes
idUUIDNPK
task_idString(120)Y
strategy_idUUIDY→ strategies.id
run_nameString(240)Ndefault "sweep"
methodString(32)N`grid
metricString(64)Ndefault "sharpe"
statusString(32)N`queued
n_trials, n_completedIntegerN
best_trial_idString(36)Y
best_metric_valueFloatY
parameter_space, base_config, summaryJSONY

optimization_trials

ColumnTypeNullFKNotes
idUUIDNPK
run_idUUIDN→ optimization_runs.idcascade
backtest_idUUIDY→ backtest_runs.id
trial_indexIntegerN
parametersJSONY
statusString(32)N
metric_value, sharpe, sortino, total_return, max_drawdown, final_equityFloatY
errorTextY

3. Ledger — models.py

signals

ColumnTypeNullFKNotes
idUUIDNPK
strategy_idUUIDY→ strategies.id
backtest_idUUIDY→ backtest_runs.id
vt_symbolString(40)Nindexed
directionString(10)N`long
strengthFloatN
confidenceFloatYdefault 1.0
rationaleTextY

Composite index: ix_signals_symbol_ts (vt_symbol, created_at).

orders

ColumnTypeNullFKNotes
idUUIDNPK
backtest_idUUIDY→ backtest_runs.id
strategy_idUUIDY→ strategies.id
vt_symbolString(40)Nindexed
sideString(8)N`buy
order_typeString(16)N`market
quantity, priceFloatvaries
statusString(16)Ndefault submitting
referenceString(120)Ypaper:<run_id> for paper trading

fills

ColumnTypeNullFKNotes
idUUIDNPK
order_idUUIDY→ orders.id
vt_symbolString(40)Nindexed
sideString(8)N
quantity, priceFloatN
commission, slippageFloatYdefault 0

ledger_entries

The canonical audit trail. Every action goes through LedgerWriter.

ColumnTypeNullFKNotes
idUUIDNPK
backtest_idUUIDY→ backtest_runs.id
strategy_idUUIDY→ strategies.id
entry_typeString(32)N`SIGNAL
levelString(16)N`debug
messageTextN
payloadJSONYdefault {}

Index: ix_ledger_type_ts (entry_type, created_at).


4. Instruments — models.py + models_instruments.py

instruments (parent)

The polymorphic root. instrument_class is the discriminator; subclass rows live in instrument_<class> tables keyed on instruments.id.

ColumnTypeNullFKNotes
idUUIDNPK
vt_symbolString(64)NuniqueSymbol.format()
tickerString(64)Nindexed
exchangeString(32)Y
asset_classString(32)Y`equity
security_typeString(32)Y`equity
instrument_classString(32)Yindexeddiscriminator
issuer_idUUIDY→ issuers.id
identifiersJSONY{ticker, isin, cusip, …}
sector, industry, region, currencyStringY
tick_size, multiplier, min_quantity, max_quantity, lot_sizeFloatYexchange specs
price_precision, size_precisionIntegerY
is_activeBooleanNdefault true
tagsJSONYdefault []
metaJSONYdefault {}

Joined-table subclasses (instrument_<class>)

All share id PK that's also a FK to instruments.id. Each table adds shape-specific columns. For full column lists see models_instruments.py; the ERD in alphaswarm_docs/erd.md lists key columns per subclass.

Subclass tablePolymorphic identityDistinctive columns
instrument_equityspotisin, cusip, figi, lei, gics_sector, shares_outstanding, is_adr
instrument_etfetfinception_date, aum, expense_ratio, is_leveraged, replication
instrument_indexindexadministrator, methodology, constituent_count, base_value
instrument_bondbondcoupon, maturity, rating_sp, rating_moodys, callable, convertible
instrument_futurefutureunderlying, expiry, contract_size, cycle, delivery_month
instrument_optionoptionstrike, expiry, kind (call/put), style, occ_symbol
instrument_fx_pairfx_pairbase_currency, quote_currency, pip_size
instrument_cryptocrypto_tokensubtype, chain, contract_address, max_leverage, funding_interval
instrument_cfdcfdunderlying, margin_rate, financing_rate
instrument_commodityspot_commoditygrade, unit_of_measure, delivery
instrument_syntheticsyntheticlegs, leg_weights, formula
instrument_bettingbettingevent_name, market_type, selection_id
instrument_tokenized_assetnftchain, contract_address, token_standard

5. Dataset lineage — models.py

dataset_catalogs

Logical dataset descriptor. Iceberg-related columns added in migration 0011.

ColumnTypeNullFKNotes
idUUIDNPK
nameString(160)Nindexed
providerString(80)Nindexed
domainString(120)Ndefault "market.bars"
frequencyString(32)Y
storage_uriString(512)Y
schema_jsonJSONY
descriptionTextY
tagsJSONY
metaJSONY
iceberg_identifierString(240)Yindexed<ns>.<table>
load_modeString(32)N`managed
source_uriString(1024)Y
llm_annotationsJSONYfrom annotate_table
column_docsJSONY

Composite index: ix_dataset_catalog_name_provider.

dataset_versions

ColumnTypeNullFKNotes
idUUIDNPK
catalog_idUUIDN→ dataset_catalogs.id (cascade)
versionIntegerNdefault 1
statusString(32)N`active
as_of, start_time, end_timeDateTimeY
row_count, symbol_count, file_countIntegerNdefault 0
dataset_hashString(64)YindexedSHA-256 of inputs
materialization_uriString(512)Y
columnsJSONY
schema_jsonJSONY
metaJSONY

Composite index: ix_dataset_versions_catalog_version.

data_sources

ColumnTypeNullFKNotes
idUUIDNPK
nameStringNunique`yfinance
kindStringY`rest
base_urlStringY
metaJSONY

Edges between dataset versions and entities (instruments, series).

ColumnTypeNullFKNotes
idUUIDNPK
dataset_version_idUUIDN→ dataset_versions.id (cascade)
source_idUUIDY→ data_sources.id
instrument_idUUIDY→ instruments.id
entity_kindStringN`instrument
entity_idStringN
coverage_start, coverage_endDateTimeY
row_countIntegerY
metaJSONY
ColumnTypeNullFKNotes
idUUIDNPK
instrument_idUUIDY→ instruments.id
source_idUUIDY→ data_sources.id
identifier_kindStringN`cik
identifier_valueStringN

split_plans, split_artifacts, pipeline_recipes, experiment_plans, model_versions, model_deployments

ML lineage tables. See full column lists in models.py (search for the class name). One-liner summary:

TablePurpose
split_plansTrain/val/test split design (method, segments, FK to dataset_version)
split_artifactsMaterialised fold boundaries + index sets per split plan
pipeline_recipesPreprocessing recipes (shared/learn/infer processors)
experiment_plansTies dataset_version + split + recipe + model config + status
model_versionsOne row per trained MLflow registry version
model_deploymentsActive inference deployments (one model_version may have many)

6. Agentic — models.py

agent_runs

ColumnTypeNullFKNotes
idUUIDNPK
session_idUUIDY→ sessions.id
task_idString(120)Yindexed
crewString(120)N
statusString(32)N
promptTextN
resultJSONY
errorTextY
llm_modelString(120)Y
token_usageJSONY

crew_runs

Lightweight index for the Crew Trace UI.

ColumnTypeNullFKNotes
idUUIDNPK
task_idString(120)Nunique
crew_nameString(120)Ndefault "research"
crew_typeString(32)Nindexed`research
statusString(32)Nindexed
promptTextN
session_idString(36)Yindexed
agent_run_idUUIDY→ agent_runs.id
result, eventsJSONY
errorTextY
cost_usdFloatNdefault 0

agent_decisions, debate_turns, agent_backtests, agent_judge_reports, agent_replay_runs, backtest_interrupts

The agentic-backtest audit trail.

TablePurpose
agent_decisionsOne row per long/short/flat decision (links backtest, strategy, crew_run)
debate_turnsMulti-turn debate transcripts under a decision
agent_backtestsCrew-level metrics rolled up per backtest
agent_judge_reportsJudge LLM's evaluation of a backtest
agent_replay_runsReplays of a judged backtest with adjusted prompts
backtest_interruptsUser pause/resume markers during a long backtest

7. Feature sets — models.py

feature_sets

ColumnTypeNullFKNotes
idUUIDNPK
nameStringN
descriptionTextY
kindStringN`composite
specsJSONNlist of indicator/transformation strings
tagsJSONY
default_lookback_daysIntegerY

feature_set_versions

Immutable snapshot keyed on content_hash so the same spec rendered twice deduplicates.

feature_set_usages

Records of which backtests / deployments consumed which feature-set versions (for reverse lineage).


8. Reports + paper — models.py

TablePurposeKey columns
equity_reportsMarkdown equity research reports generated by the report-writer crewvt_symbol, cohort, markdown, cost_usd
paper_trading_runsOne row per paper or live sessionbrokerage, feed, last_heartbeat_at, bars_seen, orders_submitted, fills, state
rl_episodesSnapshot of an RL training episoderun_id, episode, mean_reward, portfolio_value

8a. Bots — models_bots.py

Tables introduced by the Bot Entity Refactor (Alembic 0020_bots). Mirror the proven agent_specs / agent_spec_versions / agent_runs_v2 pattern.

TablePurposeKey columns
botsLogical bot row (latest active version of a named spec inside a project)id, name, slug, kind (`trading
bot_versionsImmutable, hash-locked snapshot of every BotSpec changeid, bot_id FK, version, spec_hash, payload, notes, created_by, (bot_id, spec_hash) UNIQUE, (bot_id, version) UNIQUE
bot_deploymentsOne row per backtest / paper / chat / k8s invocation; references the version that produced itid, bot_id FK, version_id FK, target (`paper_session

All three tables carry ProjectScopedMixin (owner_user_id, workspace_id, project_id).


9. News — models_news.py

TableKey columns
news_itemsurl, source, published_at, headline, body
news_item_entitiesnews_item_id, vt_symbol, entity_kind (`instrument
news_sentimentsnews_item_id, scorer (`finbert

10. Events — models_events.py

corporate_events is the parent; the per-type tables FK back to it.

TableKey columns
corporate_eventsvt_symbol, event_type (`earnings
earnings_event_rowsevent_id, eps_actual, eps_estimate, revenue_actual
dividend_event_rowsevent_id, amount, ex_date, pay_date
split_event_rowsevent_id, ratio
ipo_event_rowsevent_id, offer_price, shares_offered
merger_event_rowsevent_id, acquirer, target, terms
calendar_event_rowsevent_id, event_kind, expected_time
analyst_estimatesvt_symbol, analyst, target_price, forecast_date
price_targetsvt_symbol, analyst, target_price, period
forward_estimatesvt_symbol, analyst, metric, value
regulatory_event_rowsevent_id, regulator, summary
esg_event_rowsevent_id, category, score

11. Fundamentals — models_fundamentals.py

TableKey columns
financial_statementsissuer_id, period (`Q
financial_ratiosissuer_id, period_end, pe, pb, roe, roa, debt_to_equity
key_metricsissuer_id, period_end, revenue, net_income, free_cash_flow
historical_market_capsissuer_id, as_of, market_cap
revenue_breakdownsissuer_id, period_end, segment, region, revenue
earnings_call_transcriptsissuer_id, call_date, content
management_discussion_analysisissuer_id, period_end, mda_text
reported_financialsissuer_id, period_end, xbrl_payload

12. Macro — models_macro.py

TableKey columns
economic_seriesseries_id (FRED:GDP), title, frequency, units, source
economic_observationsseries_id, observation_date, value
cot_reportsreport_date, instrument, positions
bls_seriesseries_id, title, frequency
treasury_ratesdate, rate_3m, rate_2y, rate_10y, rate_30y
yield_curvesdate, tenors
option_seriesinstrument_id, expiry, style
option_chain_snapshotsseries_id, as_of, chain_payload
futures_curvesas_of, front_month, tenor_prices
market_holidaysexchange, date, name
market_status_historyexchange, as_of, status

13. Entities + ownership — models_entities.py + models_ownership.py

TableKey columns
issuersname, lei, country, entity_kind
government_entitiesid (PK_FK), country_code, level
fundsid (PK_FK), fund_family, fund_type
sectorscode, name
industriescode, name, sector_id
industry_classificationsissuer_id, industry_id, as_of
entity_relationshipsparent_id, child_id, kind
locationsissuer_id, country, city
key_executivesissuer_id, name, title
executive_compensationexecutive_id, year, total_comp
insider_transactionsvt_symbol, insider_name, transaction_date, quantity
institutional_holdingsvt_symbol, holder_name, as_of, quantity
form_13f_holdingsfiler_cik, vt_symbol, period_end
short_interestvt_symbol, settlement_date, short_interest
shares_float_snapshotsvt_symbol, as_of, float_shares
politician_tradespolitician, vt_symbol, trade_date, amount
fund_holdingsfund_id, vt_symbol, as_of, position

14. Taxonomy — models_taxonomy.py

TableKey columns
taxonomy_schemesname (`GICS
taxonomy_nodesscheme_id, parent_id, code, label
entity_tagsnode_id, entity_kind, entity_id
entity_crosswalksfrom_kind, from_id, to_kind, to_id

15. External-source indexes — models.py

TablePurposeKey columns
fred_seriesFRED metadata indexseries_id, title, units, frequency
sec_filingsSEC EDGAR filing indexinstrument_id, accession, form, filing_date
gdelt_mentionsGDelt GKG mention indexinstrument_id, mention_time, gkg_payload

Iceberg namespace conventions

Iceberg tables sit alongside the Postgres schema; their identifiers are stored in dataset_catalogs.iceberg_identifier for cross-lookup.

NamespaceSource
alphaswarmGeneric / default (fallback when no --namespace provided)
alphaswarm_smokeSmoke-test namespace (scripts/iceberg_smoke.py)
alphaswarm_cfpbCFPB regulatory ingest
alphaswarm_usptoUSPTO regulatory ingest
alphaswarm_fdaopenFDA regulatory ingest
alphaswarm_secSEC quarterly data sets
alphaswarm_bars(reserved) generic OHLCV cache
alphaswarm_features(reserved) feature-set materialisations

Naming rules:

  • Namespace: aqp_<source>, lower-snake-case, ≤32 chars.
  • Table: lower-snake-case, ≤48 chars, descriptive nouns (hmda_lar, device_event, broker_dealers).
  • The Director (Nemotron) decides the final table name within these rules; identity-plan fallback uses the discovered family name as-is.

Layout:

C:/alphaswarm-warehouse/iceberg/
├── catalog.db # SQLite metadata
└── <namespace>/
└── <table>/
├── data/00000-0-<uuid>.parquet
├── data/00001-0-<uuid>.parquet
└── metadata/
├── 00000-<uuid>.metadata.json
├── 00001-<uuid>.metadata.json
├── <uuid>-m0.avro # manifest list
└── snap-<snap_id>-...avro # snapshot

Snapshots are append-only — every append_arrow produces a new metadata.json revision. Old snapshots can be expired with PyIceberg's Table.expire_snapshots(...) (not exposed via API yet).

Updating this dictionary:

When you add an ORM column or a new table:

  1. Update the corresponding section above.
  2. If you added a table to a per-domain ERD scope, update alphaswarm_docs/erd.md too.
  3. Cross-link the migration that introduced the change.