Knowledge GraphsEnterprise AI

Perplexity Gate: Adaptive Retrieval Routing

The perplexity gate decides whether a query needs document retrieval — using either a structured LLM classifier or token log-probabilities — before routing to the retrieval pipeline.

Dawson Bauer

May 20, 2026

Overview

Not every query benefits from knowledge graph retrieval. Factual, well-known questions are answered reliably by the LLM's parametric memory. Retrieval adds latency and can introduce irrelevant passages that confuse generation.

The perplexity gate (check_knowledge_node) decides whether a query needs document retrieval before routing to the retrieval pipeline. It runs first in every query, before any entity search or chunk lookup.

The gate runs first on every query. It evaluates the query and branches: if retrieval isn't needed, the query goes straight to answer generation using the LLM's own parametric memory; if it is needed, the query passes on to strategy classification and the full retrieval pipeline.

Two Modes

Mode 1: Structured LLM Classifier (default)

When perplexity_gate.enabled = false (the default), the gate uses a structured LLM call with a boolean output:

In this mode the gate makes a structured LLM call that returns two things: a boolean for whether retrieval is needed, and a short reasoning string. The system prompt frames the model as a routing assistant deciding whether the query requires specialised retrieval from a private knowledge base.

The LLM considers whether the query is about general world knowledge, private documents, or prior conversation context.

This mode is smarter at recognising KG-specific query patterns. The logprobs mode tends to score KG-specific queries too low — the LLM produces fluent generic answers for almost anything — making the threshold hard to calibrate.

Mode 2: Logprobs Perplexity (optional)

When perplexity_gate.enabled = true, the gate measures the model's uncertainty using token log-probabilities:

Perplexity is the exponential of the negative mean log-probability across the generated tokens — a standard measure of how uncertain the model is about what it just produced.

The model generates a short speculative answer (max_tokens: 80). If confident, tokens have high log-probability → low perplexity. If uncertain, log-probabilities drop → high perplexity.

If perplexity meets or exceeds the threshold, the model is judged uncertain and retrieval is triggered; below the threshold, the model is confident enough to answer directly.

Fallback chain: If logprobs are unsupported, the gate falls back to the structured LLM classifier. If that also fails, needs_retrieval=True is the safe default.

Memory Context Loading

check_knowledge_node is also where conversation history is loaded from Neo4j. If memory_node_id is present, the full chain of prior turns is fetched and injected into conversation_history:

When a memory node is present, the full chain of prior turns is fetched from Neo4j and injected into the conversation history.

A follow-up question like "What about the margins?" — in context of a prior retrieved answer — may not need a fresh retrieval.

Configuration

The gate is configurable: a threshold controls how high perplexity must be to trigger retrieval, a token cap limits the speculative answer used for the logprobs measurement, and an enabled flag switches between the logprobs mode and the default LLM-classifier mode (the recommended setting).

Threshold Calibration (logprobs mode)

Query type	Typical perplexity	Decision
"What is 2+2?"	1.5–3.0	Skip retrieval
"Who founded Apple?"	3.0–6.0	Borderline
"What did the Q3 report say about margins?"	25–80	Retrieve
"Summarise the uploaded contract"	40–100	Retrieve

Raise threshold to 10–15 to filter more aggressively.

Output State Fields

Field	Type	Description
`needs_retrieval`	bool	Whether the retrieval pipeline should run
`perplexity_score`	float \	None
`conversation_history`	list	Loaded from MemoryNode chain

Keep Reading

Knowledge GraphsEnterprise AI

Entity and Relation Extraction & Compression

A deep dive into Anatypical's two-phase pipeline: GLiNER for zero-shot NER and a single-pass LLM for relation triplets, followed by cross-document deduplication and pre-materialized RelationStar summaries.

June 4, 2026

Knowledge GraphsEnterprise AI

Branching Memory: Persistent Conversational Context in GraphRAG

Anatypical stores conversation turns as a persistent graph in Neo4j, enabling durable context, branching threads, and provenance tracking that survives session restarts.

June 3, 2026

Knowledge GraphsGlass Box

Source Traceability: From Answer Back to Passage

Every answer Anatypical generates is anchored to specific document passages and entities via persistent Neo4j graph edges — surviving re-ingestion, entity merges, and session restarts.

May 21, 2026