Neuro-Symbolic Financial Reasoning via Deterministic Fact Ledgers
P. Agand (2026). “Neuro-Symbolic Financial Reasoning via Deterministic Fact Ledgers.” arXiv.
VeNRA — a Verifiable Numerical Reasoning Agent that eliminates hallucinations in financial AI by combining typed Universal Fact Ledgers, deterministic Python…
VeNRA (Verifiable Numerical Reasoning Agent) is the architecture at the core of Fact AI Lab. This paper presents the full technical design and evaluation results.
Abstract
Large language models deployed in financial workflows suffer from three compounding failure modes: arithmetic errors, mis-grounded retrieval, and inconsistency across long reasoning chains. We present VeNRA, a neuro-symbolic agent that addresses all three by separating linguistic reasoning from numerical computation. VeNRA retrieves strongly-typed financial facts via a Universal Fact Ledger (UFL) rather than text chunks, executes all arithmetic via deterministic Python, and audits reasoning traces with VeNRA Sentinel — a compact SLM optimized for forensic consistency checking under tight latency budgets. On a benchmark of 500 financial analysis tasks spanning 10-K filings, earnings calls, and regulatory disclosures, VeNRA achieves a hallucination rate of under 0.3% versus 12–18% for RAG-based baselines, with median latency under 800ms.
The Problem with Standard RAG in Finance
Financial analysis requires precision that text generation cannot reliably provide. Consider a simple calculation: gross margin from an income statement. A well-prompted LLM retrieves the right paragraph, reads the right numbers, and then — because its output distribution is over tokens, not numbers — may introduce rounding errors, use the wrong fiscal period, or conflate GAAP and non-GAAP figures.
These aren't rare edge cases. They're structural properties of how LLMs generate text. The solution isn't better prompting. It's a different architecture.
VeNRA Architecture
Universal Fact Ledger (UFL): A typed, versioned store of financial facts extracted from source documents. Every entry has: a type tag (Revenue, GrossMargin, ShareCount), a value, a period (FY2024-Q3), a source document reference, and an extraction timestamp. The LLM never handles raw numbers — it queries the ledger by type and period.
Double-Lock Grounding: Two verification stages before any numerical claim enters the output:
- Ledger grounding — does the claimed fact exist in the UFL with the correct type and period?
- Arithmetic verification — if the output is a derived figure, can it be reproduced by executing the stated formula against the UFL in deterministic Python?
VeNRA Sentinel: A 3B-parameter SLM fine-tuned for forensic audit of financial reasoning traces. Sentinel checks consistency between the reasoning chain and the UFL entries, flags mismatches, and provides a structured audit log. Target latency: under 100ms at p95.
Key Results
- Hallucination rate: < 0.3% on the benchmark (vs. 12–18% for RAG baselines)
- Source attribution accuracy: 99.7% (every number traces to a specific document passage)
- Median end-to-end latency: 780ms (including Sentinel audit)
- Coverage: Works on 10-K/10-Q filings, earnings call transcripts, and regulatory disclosures
Why This Matters
The core architectural insight: the LLM is good at understanding intent, structuring analysis, and generating readable narrative. It's unreliable at precise arithmetic and source attribution. VeNRA gives language tasks to the LLM and deterministic tasks to Python. The result is a system that can be audited — where every number in the output has a traceable path back to the source.
This is not a marginal improvement over RAG. It's a structural change in how financial AI agents should be built.
I write about this kind of work — reliability, uncertainty, building things that work in production. One email per month.