Perplexity Gate: Adaptive Retrieval Routing
The perplexity gate decides whether a query needs document retrieval — using either a structured LLM classifier or token log-probabilities — before routing to the retrieval pipeline.
Overview
Not every query benefits from knowledge graph retrieval. Factual, well-known questions are answered reliably by the LLM's parametric memory. Retrieval adds latency and can introduce irrelevant passages that confuse generation.
The perplexity gate (check_knowledge_node) decides whether a query needs document retrieval before routing to the retrieval pipeline. It runs first in every query, before any entity search or chunk lookup.
The gate runs first on every query. It evaluates the query and branches: if retrieval isn't needed, the query goes straight to answer generation using the LLM's own parametric memory; if it is needed, the query passes on to strategy classification and the full retrieval pipeline.
Two Modes
Mode 1: Structured LLM Classifier (default)
When perplexity_gate.enabled = false (the default), the gate uses a structured LLM call with a boolean output:
In this mode the gate makes a structured LLM call that returns two things: a boolean for whether retrieval is needed, and a short reasoning string. The system prompt frames the model as a routing assistant deciding whether the query requires specialised retrieval from a private knowledge base.
The LLM considers whether the query is about general world knowledge, private documents, or prior conversation context.
This mode is smarter at recognising KG-specific query patterns. The logprobs mode tends to score KG-specific queries too low — the LLM produces fluent generic answers for almost anything — making the threshold hard to calibrate.
Mode 2: Logprobs Perplexity (optional)
When perplexity_gate.enabled = true, the gate measures the model's uncertainty using token log-probabilities:
Perplexity is the exponential of the negative mean log-probability across the generated tokens — a standard measure of how uncertain the model is about what it just produced.
The model generates a short speculative answer (max_tokens: 80). If confident, tokens have high log-probability → low perplexity. If uncertain, log-probabilities drop → high perplexity.
If perplexity meets or exceeds the threshold, the model is judged uncertain and retrieval is triggered; below the threshold, the model is confident enough to answer directly.
Fallback chain: If logprobs are unsupported, the gate falls back to the structured LLM classifier. If that also fails, needs_retrieval=True is the safe default.
Memory Context Loading
check_knowledge_node is also where conversation history is loaded from Neo4j. If memory_node_id is present, the full chain of prior turns is fetched and injected into conversation_history:
When a memory node is present, the full chain of prior turns is fetched from Neo4j and injected into the conversation history.
A follow-up question like "What about the margins?" — in context of a prior retrieved answer — may not need a fresh retrieval.
Configuration
The gate is configurable: a threshold controls how high perplexity must be to trigger retrieval, a token cap limits the speculative answer used for the logprobs measurement, and an enabled flag switches between the logprobs mode and the default LLM-classifier mode (the recommended setting).
Threshold Calibration (logprobs mode)
| Query type | Typical perplexity | Decision |
|---|---|---|
| "What is 2+2?" | 1.5–3.0 | Skip retrieval |
| "Who founded Apple?" | 3.0–6.0 | Borderline |
| "What did the Q3 report say about margins?" | 25–80 | Retrieve |
| "Summarise the uploaded contract" | 40–100 | Retrieve |
Raise threshold to 10–15 to filter more aggressively.
Output State Fields
| Field | Type | Description |
|---|---|---|
needs_retrieval | bool | Whether the retrieval pipeline should run |
perplexity_score | float \ | None |
conversation_history | list | Loaded from MemoryNode chain |
Keep Reading
Source Traceability: From Answer Back to Passage
Every answer Anatypical generates is anchored to specific document passages and entities via persistent Neo4j graph edges — surviving re-ingestion, entity merges, and session restarts.
Vadalog Semantic Grouping: Structured Predicate Taxonomy for Knowledge Graphs
How Barnyard normalizes inconsistent LLM-extracted predicates into a 30+ canonical predicate ontology across 13 semantic groups, preventing knowledge graph fragmentation.
Tribrid RAG: Three-Signal Retrieval with MMR Fusion
Barnyard combines entity search (BM25 + vector), topic cluster retrieval, and knowledge graph expansion into a single ranked passage pool using Maximum Marginal Relevance fusion.