Knowledge GraphsProduct Updates

Vadalog Rules Frontend: Per-Space Normalization UI

A point-and-click UI replacing the curl-only workflow for editing per-space normalization rules — aliases, label maps, predicate maps, and CSV field maps — with dry-run validation and a full round-trip contract.

Dawson Bauer

What it does

Per-space normalization rules (.vada facts) used to be editable only via curl: users had to write Datalog-style syntax by hand and POST it to /api/v1/graph/rules. This article documents the end-to-end frontend stack that replaces that workflow with a point-and-click UI, plus the backend patches that made it possible.

The work ships in three layers:

  • Layer 1 — Per-space rules panel. A modal in ThoughtSpace (header dropdown → "Space Rules") with tab editors for aliases, label maps, predicate maps, and CSV field maps.
  • Layer 2 — Datasource onboarding wizard. Three-step flow (Connect → Map → Ingest) for registering external CSV/JSON feeds. Supports remote URL sources and browser CSV uploads.
  • Layer 3 — Raw .vada editor. Plain monospaced textarea for power users, backed by a new POST /rules?dry_run=true flag that validates without persisting and returns a per-line error list.

Architecture

The stack spans three services. The ThoughtSpace frontend, built on Next.js, holds the rules-panel modal, the datasource wizard, the raw editor, and the Vadalog renderer. It talks over HTTP to Orchard, a FastAPI service that handles the rules routes and proxies requests onward. Orchard forwards to Barnyard — also FastAPI, with Celery for background work — which runs the graph routes and the Vadalog rule engine. Each space's rules are persisted as text on a SpaceConfig node in Neo4j, and any uploaded datasource files are stored in S3.


Endpoint Reference

Rules (save / get / delete)

MethodOrchard pathNotes
GET/rules?space_id=Returns parsed_summaryparsed_rules
POST/rules {space_id,content}?dry_run=true validates only
DELETE/rules?space_id=Clears space rules (reverts to global)

Datasource wizard

MethodOrchard pathPurpose
POST/rules/datasource/previewFetch first N rows
POST/rules/datasource/uploadBrowser CSV → S3 + datasource() rule
POST/rules/datasource/ingestQueue datasource_ingest_task

.vada Syntax Reference

Every rule is a single-line literal fact ending with ). — args are CSV-parsed (RFC 4180). Lines starting with % are comments.

RuleSignatureBehavior
canonicalcanonical("alias", "Canonical").Merge alias → canonical entity name
label_maplabel_map("surface", "canonical").Normalize GLiNER surface labels
predicate_mappredicate_map("raw phrase", "CANONICAL").Rewrite relation predicates
field_mapfield_map("ExtCol", "text", "entity_label").Map CSV/JSON column → TextNode field
datasource`datasource("name", "csv\json", "url").`
datasource_authdatasource_auth("name", "Header", "value").HTTP header for fetching source
group_bygroup_by("field", "label", 20).Bucket datasource rows by column

Round-Trip Contract

The UI edits a structured state tree, not raw text — saves go through renderVada(state): string. The invariant the renderer must preserve:

The renderer's core invariant is that a round trip must be lossless: parsing a rules fixture, rendering it back to text, and parsing that result again must produce exactly the same structure as the original. This is enforced by a unit test, and any change to the TypeScript renderer must be mirrored in the Python parser the test checks against.

This is checked in Barnyard/unit_tests/test_vadalog_roundtrip.py. Any change to the TS renderer must update the Python mirror in the test.

Three specific invariants:

  1. CSV-quote escaping. Values with " inside must round-trip via RFC 4180 doubling.
  2. Key casing. The parser lowercases the first arg of canonical, label_map, and predicate_map.
  3. Numbers are CSV-quoted. group_by("field", "label", "20") — not bare numeric tokens.

Notable Fixes

  • Pre-existing ingestion gap was patched. Before this work, .vada rules saved via the API were silently ignored during regular file uploads. The fix added load_space_rules_sync and wired it into both preprocess_text_node and _extract_relations.
  • Per-upload compression moved out of memify. extract_graph_task now calls compress_entities_llm directly on every ingestion.
  • Datasource preview URL validation. fetch_datasource_rows runs _validate_datasource_url which rejects SSRF targets (private IPs, loopback, link-local, metadata endpoints).
  • Recluster button semantics. Now wired to /clustering/recluster (memify only), not /clustering/full_recluster.