Project

VeNRA: Verifiable Numerical Reasoning Agent

VeNRAPythonLLM AgentsFinancial AIHuggingFace

Opensource neurosymbolic agent for hallucinationfree financial reasoning. Deterministic Python execution, Universal Fact Ledger, and Sentinel hallucination dete

VeNRA is an open-source neuro-symbolic agent that eliminates hallucinations in financial reasoning by separating linguistic tasks (LLM) from numerical tasks (deterministic Python). The full technical paper is on arXiv - this page covers the engineering side.

Architecture

VeNRA has three components:

Ingestion Engine - Parses 10-K/10-Q filings, earnings call transcripts, and regulatory disclosures into a typed Universal Fact Ledger (UFL). Every extracted fact carries: a type tag (Revenue, GrossMargin, ShareCount), a value, a fiscal period, a source document reference, and an extraction timestamp. The LLM never handles raw numbers directly - it queries the ledger by type and period.

Runtime Agent - Orchestrates financial analysis tasks. When a numerical claim is required, the agent queries the UFL and executes all arithmetic via deterministic Python rather than letting the LLM generate the result. This eliminates the structural arithmetic errors that occur when language models generate tokens over a probability distribution rather than computing values.

Sentinel Service - A compact SLM (3B parameters) fine-tuned for forensic audit of financial reasoning traces. Sentinel checks consistency between the agent's reasoning chain and the UFL entries, flags mismatches, and produces a structured audit log. Target latency: under 100ms at p95.

Key Features

Deterministic reasoning - arithmetic runs in Python, not in the LLM's generation. No rounding errors, no period confusion.
Universal Fact Ledger - typed, versioned fact store with source attribution. Every number traces back to a specific document passage.
Traceability - every output includes a full audit trail: which UFL entries were used, which Python expressions were evaluated, and Sentinel's consistency verdict.
Hybrid retrieval - the ingestion pipeline uses both dense embeddings (semantic similarity) and exact-match ledger queries, combining the strengths of both.

Results

On a benchmark of 500 financial analysis tasks across 10-K filings, earnings calls, and regulatory disclosures:

Hallucination rate: < 0.3% (vs. 12–18% for RAG baselines)
Source attribution accuracy: 99.7%
Median end-to-end latency: 780ms (including Sentinel audit)

Try It

The hallucination detector is live on HuggingFace Spaces. The dataset and fine-tuned model weights are also public. See the links above.

For the full technical design and evaluation methodology, see the arXiv paper.

I write about this kind of work - reliability, uncertainty, building things that work in production. One email per month.