Skip to main content

Temporal identifier resolution

Status: Phase 1 shipped (Alembic 0039 + 0040). The identifier_links table is now the authoritative source for identifier resolution; the legacy Instrument.identifiers JSON blob is kept for backward compatibility but is no longer authoritative.

Why temporal resolution

Financial identifiers are not stable across time. A non-exhaustive list of why:

EventImpact
Ticker change (M&A, rebranding)FB -> META 2022-06-09
Symbol change (re-listing)ABEV3AMBV4 on B3
CUSIP / ISIN re-issue (corporate action)Stock split issuance may mint a new CUSIP
Index reconstitutionRussell add/drops change tracker constituents
ADR sponsorship upgradeConversion ratio may change

A backtest that resolves META to the modern ID and walks bar data from 2018 will silently introduce survivorship and forward-looking bias because the row didn't yet exist under that ticker. The resolver service fixes this by walking time-versioned identifier_links rows scoped by valid_from <= as_of and (valid_to IS NULL OR valid_to > as_of).

Table shape

The identifier_links table predates Phase 1; Phase 1 promotes it to authoritative status. Its schema:

identifier_links
+--------------------+-----------------------------------------------+
| id | UUID |
| entity_kind | instrument | fred_series | sec_filing | ... |
| entity_id | parent entity id |
| instrument_id | denormalized FK to instruments.id (NULL OK) |
| scheme | ticker | vt_symbol | cik | cusip | isin | |
| | figi | sedol | lei | gvkey | permid | ... |
| value | identifier value |
| valid_from | datetime | NULL ("from the beginning") |
| valid_to | datetime | NULL ("still valid") |
| source_id | FK to data_sources.id |
| confidence | 0.0 - 1.0, defaults 1.0 |
| meta | JSON |
| created_at | datetime |
+--------------------+-----------------------------------------------+

Resolver API

The two public entry points are :class:alphaswarm.data.identity.IdentifierResolver and the matching DataMCP tools.

Python: forward resolution

from datetime import datetime
from alphaswarm.data.identity import resolve

# "What was AAPL's CUSIP on 2018-06-12?"
hit = resolve(
scheme="cusip",
value="037833100",
as_of=datetime(2018, 6, 12),
)
print(hit.value, hit.is_open_ended)

Python: history walk

from alphaswarm.data.identity import history

# Every alias known for Apple
for row in history(entity_kind="instrument", entity_id="aapl-uuid"):
print(row.scheme, row.value, row.valid_from, row.valid_to)

Agent / MCP

data.identity.resolve(scheme="cusip", value="037833100", as_of="2018-06-12")
data.identity.history(entity_kind="instrument", entity_id="aapl-uuid")

The DataMCP layer is the only path agents may use to resolve identifiers (AGENTS rule 22). The Python module is reserved for loaders / pipelines / persistence code; agent code never imports the ORM model directly.

Backfill from legacy JSON blob (migration 0040)

The legacy Instrument.identifiers JSON column is a flat {scheme: value} map. Migration 0040 walks every row, normalises the scheme name (lower-cased, aliases collapsed), and inserts a row into identifier_links with valid_from=valid_to=NULL ("valid for all time the row represents") and confidence=0.7 (so a canonical loader row at confidence=1.0 always wins the resolver tiebreaker).

The legacy JSON column is kept: readers that haven't migrated to the resolver continue to work. New readers MUST go through the resolver so they see corrected validity windows.

Validity-window semantics

valid_fromvalid_toMeaning
NULLNULLValid for all time the row represents
2018-01-01NULLValid from 2018-01-01, still current
NULL2022-06-09Valid up to (and including) 2022-06-09
2010-05-012015-12-31Valid in the closed-open interval

The valid_to is exclusive -- a row with valid_to=2022-06-09 is NOT valid on 2022-06-09. The lookup predicate is therefore valid_to > as_of, not valid_to >= as_of.

Confidence ordering

When multiple rows satisfy the validity predicate, the highest confidence wins. Default loader rows ship with confidence=1.0; the legacy-blob backfill from migration 0040 uses confidence=0.7 so it's overridden the moment a canonical loader populates the same alias.

Heuristic / fuzzy-match loaders should use confidence in the 0.3-0.6 range so they only win when no canonical row exists.