Tribrid RAG: Three-Signal Retrieval with MMR Fusion
Barnyard combines entity search (BM25 + vector), topic cluster retrieval, and knowledge graph expansion into a single ranked passage pool using Maximum Marginal Relevance fusion.
Overview
Most retrieval-augmented generation systems use a single retrieval signal — either dense vector search or sparse keyword search. Barnyard uses a tribrid approach that combines three independent signals, each optimised for a different type of relevance, then merges them with Maximum Marginal Relevance (MMR) to produce a single ranked, deduplicated passage pool.
At a high level, a query fans out along two paths. The entity path runs BM25 and vector search to find seed entities, expands them by one hop through canonical relations in the graph, and pulls the chunks those entities were extracted from. The chunk path searches topic clusters to find relevant text nodes and their chunks. The two pools are then merged with Maximum Marginal Relevance into a single ranked, deduplicated set of final passages.
Strategy Selection
Before retrieval begins, classify_query_node routes the query to one of three strategies:
| Strategy | Trigger | Paths active |
|---|---|---|
"entities" | Query asks about specific named things, people, or organisations | Entity path only |
"chunks" | Query asks about topics, themes, or document-level context | Chunk path only |
"both" | Query mixes entity-specific and thematic elements | Both paths; MMR merge |
The classifier uses a structured LLM output with a short reasoning chain. Strategy "both" is the default when the classifier is uncertain.
Path 1: Entity Path
Stage 1a — Hybrid Entity Search (BM25 + Vector)
Two searches run concurrently:
- BM25 keyword search — Neo4j full-text index on
Entity.name(exact/lexical match) - Vector search — ColBERT embeddings against the
Entity_nameQdrant collection
Results are fused with Reciprocal Rank Fusion (RRF):
The two lists are fused with Reciprocal Rank Fusion: each entity's score is the sum of one divided by a constant plus its rank in the keyword list, and one divided by that same constant plus its rank in the vector list.
k=60 dampens the impact of top-rank outliers. Entities appearing highly in both lists score highest.
Stage 1b — Score Normalisation
RRF scores (~0.013–0.033) are incompatible with chunk path cosine similarity scores (0–1). Before MMR, entity scores are min-max normalised to [0, 1]:
Each entity's score is then min-max normalised into a 0-to-1 range — subtracting the lowest RRF score in the set and dividing by the spread between the highest and lowest.
Stage 1c — CanonicalRelation Graph Expansion
Top-K seed entities are expanded by one hop via CanonicalRelation. Querying "Tim Cook" also retrieves "Apple Inc." and "Steve Jobs" if they share CanonicalRelation edges. Expanded entities receive a fixed relevance score of 0.7.
Stage 1d — ChunkNode Retrieval via HAS_ENTITY
Each seed entity is then used to look up the chunks it came from: the system follows the HAS_ENTITY edges from every entity to its chunk nodes — scoped to the current space — and returns those chunks along with their parent text nodes.
The HAS_ENTITY edges are written at ingestion time — each entity is linked directly to the specific chunk(s) from which it was extracted by GLiNER.
Path 2: Chunk Path
Stage 2a — TopicCluster Coarse Filter
The query is encoded with MPNet (768-dim) and searched against the TopicCluster_summary Qdrant collection. Up to chunk_inner_top_k (default: 30) TopicClusters are retrieved and post-filtered by user_id/space_ids.
TopicClusters are LLM-generated semantic summaries — denser and more semantically coherent than raw chunk text, improving recall for thematic queries.
Stage 2b — ChunkNode Expansion via Neo4j
From the matching topic clusters, the system walks the graph to the text nodes tagged with each cluster and on to their chunks, returning those chunks in document order.
Each ChunkNode is scored by its parent TopicCluster's cosine similarity — the same [0,1] range as entity path scores, making them directly comparable in MMR.
MMR Fusion
When strategy is "both", merge_context_node receives both ChunkNode pools and passes them through Maximum Marginal Relevance:
Maximum Marginal Relevance scores each candidate by balancing two things: its relevance to the query, weighted by a factor lambda, minus its highest similarity to any passage already selected, weighted by one minus lambda. The result rewards passages that are both relevant and non-redundant.
By default it keeps the top 8 passages with lambda set to 0.6 (leaning toward relevance), and treats any candidate above a 0.72 Jaccard similarity threshold as a near-duplicate.
Jaccard similarity is computed on word sets. Candidates above the threshold are treated as near-duplicates of an already-selected passage and skipped regardless of relevance score.
Score Compatibility
A key design constraint: entity path and chunk path scores must be comparable for MMR to work. Without normalisation, raw RRF scores (~0.013–0.033) would always lose to cosine similarity scores (0.3–0.9) — the entity path would be effectively muted.
The min-max normalisation in Stage 1b resolves this. Related entity chunks (expanded via CanonicalRelation) receive a fixed score of 0.7 — tunable via retrieval.related_entity_chunk_score.
Configuration
The retrieval and fusion behaviour is fully configurable — the number of seed entities, the RRF constant, the fixed score given to related-entity chunks, the various top-K limits for clusters and chunks, and the MMR settings (how many passages to keep, the relevance-versus-diversity balance, and the near-duplicate threshold) are all tunable parameters.
Keep Reading
Source Traceability: From Answer Back to Passage
Every answer Anatypical generates is anchored to specific document passages and entities via persistent Neo4j graph edges — surviving re-ingestion, entity merges, and session restarts.
Vadalog Semantic Grouping: Structured Predicate Taxonomy for Knowledge Graphs
How Barnyard normalizes inconsistent LLM-extracted predicates into a 30+ canonical predicate ontology across 13 semantic groups, preventing knowledge graph fragmentation.
Perplexity Gate: Adaptive Retrieval Routing
The perplexity gate decides whether a query needs document retrieval — using either a structured LLM classifier or token log-probabilities — before routing to the retrieval pipeline.