“Every technique for making AI trustworthy is a variation on one humble instruction: do not answer from memory—go and check.”
One Move, Three Names
The field has a confusing vocabulary for a simple idea. Retrieval-augmented generation, tool use, function calling, the Model Context Protocol—the names multiply, but they are all the same move underneath: give the model a source of truth outside itself, and make it reach for that source instead of its own probabilities.
I made the argument for why this matters in Why LLMs Hallucinate, and the architecture in Compute, Then Interpret. This piece is the practical companion—a field guide to the actual techniques, in plain language, with a real system you can connect to and inspect.
Retrieval-Augmented Generation (RAG)
The first and most common technique. Instead of letting the model answer from its training, you first retrieve relevant documents—from a database, a knowledge base, a set of files—and hand them to the model as context. The model then answers grounded in those documents.
- Retrieval-Augmented Generation (RAG) technology
-
A method where a system searches an external source for relevant information and supplies it to a language model, which then generates its answer from that supplied material rather than from memory alone. It anchors responses to real documents, reducing fabrication in knowledge-heavy tasks.
RAG is the right tool when truth lives in text—policies, manuals, papers, a body of writing. The model still does the interpreting, but it interprets something real that was placed in front of it, rather than confabulating from half-remembered patterns.
Tool Use and Function Calling
RAG hands the model documents. Tool use hands it capabilities. You give the model access to functions it can call—a calculator, a search, a database query, a computation engine—and it invokes them to get exact results, then interprets those results.
This is the move for anything that must be computed rather than recalled. A model asked to do arithmetic will approximate; a model given a calculator tool will call it and get the exact answer. The difference between those two behaviours is the difference between a plausible number and a correct one.
A language model doing math in its head is a party trick. A language model that knows to reach for a calculator is an engineer. The intelligence is in the reaching, not the guessing.
This is exactly how Eternal Evals works: the deterministic astrology engine is exposed as tools an AI can call—compute the chart, read a section—so the model gets real computed values instead of inventing them.
The Model Context Protocol (MCP)
Tool use used to be bespoke—every app wired its tools to its model in its own way. MCP is the emerging open standard that makes tools portable: a tool exposed via MCP can be used by any compatible AI client—Claude, Cursor, and others—without custom integration.
This is why I published the Eternal Evals engine as an API and an MCP connector, and wrote a full guide to using it across ChatGPT, Claude and Cursor. It means an assistant does not guess at a chart; it reaches, through a standard protocol, for a tool that computes one—and interprets only what returns. The computation–interpretation boundary, made portable across the whole tool ecosystem.
Grounding Is Not Enough Without Discipline
Here is the caveat the hype skips: these techniques enable honesty, they do not guarantee it. Grounding done carelessly fails in quiet ways.
- Garbage retrieval. If RAG surfaces the wrong or low-quality documents, the model faithfully grounds itself in nonsense. Retrieval quality is everything.
- Ignored tools. A model with a calculator can still answer from memory if the system lets it. The tool has to be the required path, not an optional one.
- Ungoverned interpretation. Even with perfect facts, a model can over-reach in how it interprets them. The boundary has to be enforced, not hoped for.
Grounding is necessary, not sufficient. The discipline of computation-first design—making the tool the mandatory source of every checkable fact—is what turns the technique into a guarantee.
A Worked Stack
Put concretely, an honest system has three layers you can actually point at:
- A deterministic core that owns the facts—the computation engine, the retrieval index, the database. Tested like infrastructure.
- An exposure layer—tools, functions, an MCP server—through which the model reaches the core.
- An interpretation layer—the language model—that receives real results and does only linguistic work on them.
Eternal Evals is exactly this: a Swiss-Ephemeris engine (the core), exposed as an API and MCP tools (the exposure), consumed by a language model that interprets the computed chart (interpretation). Nothing checkable is ever left to the guesser.
How to Tell If It Is Actually Grounded
Finally, the test—because a system can claim to be grounded and not be. The check is the one from Evals, Not Vibes: can you trace every factual claim back to its source?
Ask the system where a claim came from. In a truly grounded system, every fact resolves to a document it retrieved or a value it computed—you can follow the thread to the source. In a merely fluent one, the thread ends in a confident sentence hovering over nothing. That traceability is the whole difference, and it is checkable in five minutes.
The toolbox is real and it is available now. What it asks of us is not brilliance but discipline: to build systems that go and check, and to refuse the fluent shortcut of answering from memory. You can connect to a working example and trace its claims yourself.
Loading conversations...