neo4j
The canonical graph store. Holds the ownership graph (Workstream F), the bipartite lineage DAG (Workstream A + B), and the entity-graph service (instruments, companies, datasets, pipeline assets, service metadata). Postgres carries the snapshot rows; Neo4j carries the traversable relationships.
Identity
| Field | Value |
|---|---|
| Service id | neo4j |
| Role | graph |
| Image | neo4j:5-community |
| Port | 7474 (HTTP) + 7687 (Bolt) |
| Storage | 5 Gi PVC (cell-local); managed Neo4j Aura recommended for prod cells |
Deployment surfaces
| Surface | Where |
|---|---|
| Compose | service neo4j in alphaswarm_platform/compose/docker-compose.yml |
| Kustomize | rolled into base-services/ (cell-local StatefulSet) |
| Terraform | not provisioned by a managed module today; cloud templates run a containerised StatefulSet behind the cell's storage class |
Dependencies
Upstream: none.
Downstream:
alphaswarm-core— ownership graph reads viadata.ownership.*MCP tool; lineage relay writes through OpenLineage adapter.alphaswarm-worker— sync tasks that mirror Postgres rows into Neo4j edges.
Sync semantics
- Postgres remains the canonical source of truth for entity attributes; Neo4j holds the relationships.
- Sync is event-driven via the
lineagequeue family; backfills run throughdata.lineage.replayCelery tasks. - Read paths go through the
data.ownership.*anddata.lineage.*DataMCP tools — the agentic plane MUST NOT speak Bolt directly.
Operations
- Auth: username/password via ExternalSecret; Bolt TLS through Linkerd mTLS.
- Backups: native
neo4j-admin database backupcron to MinIO/S3. - Cypher style: queries are stored under
alphaswarm/data/sources/graph/queries/; ad-hoc Cypher in agent prompts is forbidden.
See also
ownership-graph— ownership graph contract (Workstream F).lineage-graph— bipartite lineage DAG + OpenLineage relay (Workstream A + B).entity-graph-services.md— entity registry + service control via Neo4j.