Knowledge GraphsGlass BoxEnterprise AI

Source Traceability: From Answer Back to Passage

Every answer Anatypical generates is anchored to specific document passages and entities via persistent Neo4j graph edges — surviving re-ingestion, entity merges, and session restarts.

Dawson Bauer

May 21, 2026

Overview

Every answer Anatypical generates is anchored to the specific document passages and entities that informed it. This traceability chain is stored persistently in Neo4j as graph edges — surviving document re-ingestion, entity merges, and session restarts.

Each answer is stored as a MemoryNode with two kinds of links. An ABOUT edge connects it to each entity that was discussed, and each of those entities is in turn linked — via HAS_ENTITY — to the chunk it was extracted from. A SOURCED_FROM edge connects the answer to the source document, which is linked via HAS_CHUNK to the exact passage used.

Chunk-Level Attribution

HAS_ENTITY Edges (Ingestion)

During extract_graph_task, every entity extracted by GLiNER is linked not just to its parent TextNode, but to the specific ChunkNode it was extracted from:

During ingestion, every entity GLiNER extracts is linked not just to its parent document but to the specific chunk it came from — the system matches the chunk at the right index and merges a HAS_ENTITY edge between that chunk and the entity.

Compression Robustness

When compress_entities_llm() merges alias entities into a canonical entity, HAS_ENTITY edges from ChunkNodes are redirected to the canonical entity before aliases are deleted. ChunkNode attribution survives entity compression.

Retrieval-Time Traceability

_retrieve_user_entities() returns explicit source IDs alongside every result:

At retrieval time, the system returns explicit source identifiers alongside every result: the IDs of all entities involved (both the seed entities and those reached through canonical-relation expansion), the parent document IDs of every returned chunk, and the chunks themselves with their text and relevance scores.

These fields travel through the LangGraph workflow and are consumed by generate_answer_node to create the MemoryNode.

MemoryNode Anchoring

After generating an answer, generate_answer_node writes a MemoryNode with two types of provenance edges:

ABOUT → Entity

For provenance, the answer's MemoryNode is linked to each contributing entity with an ABOUT edge.

ABOUT edges use deterministic entity IDs (MD5 of name+label+space_id). They survive entity re-extraction, surface-form merges, and document re-ingestion.

SOURCED_FROM → TextNode

It is also linked to each source document with a SOURCED_FROM edge.

SOURCED_FROM edges use UUID4 TextNode IDs (stable across re-ingestion). If the TextNode is deleted, the edge is removed but the MemoryNode is preserved — conversation history remains coherent.

Graph Traversal: From Memory to Source

To trace an answer back to its sources, the system follows the SOURCED_FROM edges from a memory node to its documents, then the HAS_CHUNK edges down to the exact passages — returning the original query, each source document, and the passage text in order.

Multi-Turn Traceability

Each MemoryNode in the conversation chain has its own provenance set:

Each turn in a conversation keeps its own provenance. The first turn's answer might be about Apple Inc. and Tim Cook, sourced from a Q3 report; the next turn's answer about Apple Inc. and the iPhone 16, sourced from a product-launch document — with a NEXT_TURN link chaining the two together.

This records which documents were referenced at each point in the conversation, whether entities shifted between turns, and which turns relied on the same source document.

Edge Type Summary

Edge	From	To	Survives
`HAS_ENTITY`	ChunkNode	Entity	Entity compression (edges redirected to canonical)
`HAS_CHUNK`	TextNode	ChunkNode	Always (deleted with TextNode)
`ABOUT`	MemoryNode	Entity	Entity merges, re-extraction
`SOURCED_FROM`	MemoryNode	TextNode	Re-ingestion; removed if TextNode deleted
`NEXT_TURN`	MemoryNode	MemoryNode	Always

Keep Reading

Glass BoxKnowledge Graphs

GlassBox: Auditable AI Evaluation Middleware

GlassBox measures contextual precision, semantic faithfulness, and structural fidelity for any RAG system — then commits a tamper-proof trust scorecard to a Hyperledger Fabric ledger.

May 20, 2026

Knowledge GraphsEnterprise AI

Entity and Relation Extraction & Compression

A deep dive into Anatypical's two-phase pipeline: GLiNER for zero-shot NER and a single-pass LLM for relation triplets, followed by cross-document deduplication and pre-materialized RelationStar summaries.

June 4, 2026

Knowledge GraphsEnterprise AI

Branching Memory: Persistent Conversational Context in GraphRAG

Anatypical stores conversation turns as a persistent graph in Neo4j, enabling durable context, branching threads, and provenance tracking that survives session restarts.

June 3, 2026