Saltar al contenido principal

neo4j

The canonical graph store. Holds the ownership graph (Workstream F), the bipartite lineage DAG (Workstream A + B), and the entity-graph service (instruments, companies, datasets, pipeline assets, service metadata). Postgres carries the snapshot rows; Neo4j carries the traversable relationships.

Identity

FieldValue
Service idneo4j
Rolegraph
Imageneo4j:5-community
Port7474 (HTTP) + 7687 (Bolt)
Storage5 Gi PVC (cell-local); managed Neo4j Aura recommended for prod cells

Deployment surfaces

SurfaceWhere
Composeservice neo4j in alphaswarm_platform/compose/docker-compose.yml
Kustomizerolled into base-services/ (cell-local StatefulSet)
Terraformnot provisioned by a managed module today; cloud templates run a containerised StatefulSet behind the cell's storage class

Dependencies

Upstream: none.

Downstream:

  • alphaswarm-core — ownership graph reads via data.ownership.* MCP tool; lineage relay writes through OpenLineage adapter.
  • alphaswarm-worker — sync tasks that mirror Postgres rows into Neo4j edges.

Sync semantics

  • Postgres remains the canonical source of truth for entity attributes; Neo4j holds the relationships.
  • Sync is event-driven via the lineage queue family; backfills run through data.lineage.replay Celery tasks.
  • Read paths go through the data.ownership.* and data.lineage.* DataMCP tools — the agentic plane MUST NOT speak Bolt directly.

Operations

  • Auth: username/password via ExternalSecret; Bolt TLS through Linkerd mTLS.
  • Backups: native neo4j-admin database backup cron to MinIO/S3.
  • Cypher style: queries are stored under alphaswarm/data/sources/graph/queries/; ad-hoc Cypher in agent prompts is forbidden.

See also