Source Traceability: From Answer Back to Passage
Every answer Anatypical generates is anchored to specific document passages and entities via persistent Neo4j graph edges — surviving re-ingestion, entity merges, and session restarts.
Overview
Every answer Anatypical generates is anchored to the specific document passages and entities that informed it. This traceability chain is stored persistently in Neo4j as graph edges — surviving document re-ingestion, entity merges, and session restarts.
Each answer is stored as a MemoryNode with two kinds of links. An ABOUT edge connects it to each entity that was discussed, and each of those entities is in turn linked — via HAS_ENTITY — to the chunk it was extracted from. A SOURCED_FROM edge connects the answer to the source document, which is linked via HAS_CHUNK to the exact passage used.
Chunk-Level Attribution
HAS_ENTITY Edges (Ingestion)
During extract_graph_task, every entity extracted by GLiNER is linked not just to its parent TextNode, but to the specific ChunkNode it was extracted from:
During ingestion, every entity GLiNER extracts is linked not just to its parent document but to the specific chunk it came from — the system matches the chunk at the right index and merges a HAS_ENTITY edge between that chunk and the entity.
Compression Robustness
When compress_entities_llm() merges alias entities into a canonical entity, HAS_ENTITY edges from ChunkNodes are redirected to the canonical entity before aliases are deleted. ChunkNode attribution survives entity compression.
Retrieval-Time Traceability
_retrieve_user_entities() returns explicit source IDs alongside every result:
At retrieval time, the system returns explicit source identifiers alongside every result: the IDs of all entities involved (both the seed entities and those reached through canonical-relation expansion), the parent document IDs of every returned chunk, and the chunks themselves with their text and relevance scores.
These fields travel through the LangGraph workflow and are consumed by generate_answer_node to create the MemoryNode.
MemoryNode Anchoring
After generating an answer, generate_answer_node writes a MemoryNode with two types of provenance edges:
ABOUT → Entity
For provenance, the answer's MemoryNode is linked to each contributing entity with an ABOUT edge.
ABOUT edges use deterministic entity IDs (MD5 of name+label+space_id). They survive entity re-extraction, surface-form merges, and document re-ingestion.
SOURCED_FROM → TextNode
It is also linked to each source document with a SOURCED_FROM edge.
SOURCED_FROM edges use UUID4 TextNode IDs (stable across re-ingestion). If the TextNode is deleted, the edge is removed but the MemoryNode is preserved — conversation history remains coherent.
Graph Traversal: From Memory to Source
To trace an answer back to its sources, the system follows the SOURCED_FROM edges from a memory node to its documents, then the HAS_CHUNK edges down to the exact passages — returning the original query, each source document, and the passage text in order.
Multi-Turn Traceability
Each MemoryNode in the conversation chain has its own provenance set:
Each turn in a conversation keeps its own provenance. The first turn's answer might be about Apple Inc. and Tim Cook, sourced from a Q3 report; the next turn's answer about Apple Inc. and the iPhone 16, sourced from a product-launch document — with a NEXT_TURN link chaining the two together.
This records which documents were referenced at each point in the conversation, whether entities shifted between turns, and which turns relied on the same source document.
Edge Type Summary
| Edge | From | To | Survives |
|---|---|---|---|
HAS_ENTITY | ChunkNode | Entity | Entity compression (edges redirected to canonical) |
HAS_CHUNK | TextNode | ChunkNode | Always (deleted with TextNode) |
ABOUT | MemoryNode | Entity | Entity merges, re-extraction |
SOURCED_FROM | MemoryNode | TextNode | Re-ingestion; removed if TextNode deleted |
NEXT_TURN | MemoryNode | MemoryNode | Always |
Keep Reading
GlassBox: Auditable AI Evaluation Middleware
GlassBox measures contextual precision, semantic faithfulness, and structural fidelity for any RAG system — then commits a tamper-proof trust scorecard to a Hyperledger Fabric ledger.
Vadalog Semantic Grouping: Structured Predicate Taxonomy for Knowledge Graphs
How Barnyard normalizes inconsistent LLM-extracted predicates into a 30+ canonical predicate ontology across 13 semantic groups, preventing knowledge graph fragmentation.
Tribrid RAG: Three-Signal Retrieval with MMR Fusion
Barnyard combines entity search (BM25 + vector), topic cluster retrieval, and knowledge graph expansion into a single ranked passage pool using Maximum Marginal Relevance fusion.