TerranSoul in five scenes — how LLMs, RAG and memory become a useful daily assistant.
If you are getting lost between a model, a chatbot, RAG, memory and TerranSoul, you are not alone. They are not five names for the same thing. Here is a non-technical walkthrough — five everyday scenes — that shows what each layer is for, and why TerranSoul is more than choosing free, paid or local AI.
OpenWhy people keep confusing these
Sit at a coffee shop long enough and you will hear someone ask, with a slightly exhausted face, "so what's the difference between ChatGPT and Claude and that other thing my coworker keeps talking about?" The honest answer is that they are not in the same category at all. We just call them all "AI" because the marketing is louder than the design.
Here is the simplest version of the map. Most AI conversations mix together four layers that do different jobs.
What confuses people is not the logos. It is the fact that "AI assistant" hides several different layers behind one word: a base model that can talk, RAG that can look up your documents, memory that can remember your life, and TerranSoul that keeps all of that fresh, private and yours. Five everyday scenes is the fastest way to feel the difference.
The progression below — base LLM → RAG look-up → personal memory → connected memory → TerranSoul in daily use.
Scene 01 · Base LLMA smart stranger knocks on your study door.
Imagine you are working from home and a very clever stranger knocks on your study door. They speak well, know a lot about the world, and can write a decent first draft of almost anything. But they have never been inside your house.
A base LLM is that stranger. Ask about Roman history, meal ideas, a polite email, or the meaning of a legal phrase, and it can be wonderful. Ask "where did I leave the kids' passports?" and it has no real way to know. If it answers anyway, it is guessing with nice grammar.
This first layer is not about whether the model is free, paid, or local. Those are delivery choices. The bigger question for a non-technical person is simpler: does the assistant only know general world knowledge, or can it safely use my own information?
A plain model can sound personal without being personal. It can say "I remember" in a sentence, but unless something stored your facts and brought them back at the right time, it is still only the stranger at the door.
So Scene 1 is the baseline: smart language, broad knowledge, no dependable memory of you.
Under the hood
The model adapter sits behind a single BrainGateway interface, so chat, avatar, memory and persona do not have to be rebuilt when the chosen model changes. That separation lets the model answer with context from the same memory store, whether the model is hosted or local. The user-facing promise is simple: changing the engine should not make the assistant forget you.
Scene 02 · Personal contextFriday night, what should I cook?
It is 5:40 on a Friday. You open the fridge, sigh, close it, and ask your assistant "what should I cook?"
A generic chatbot answers that question by Googling "easy weekday dinners". You get a beautiful list of pad thai, peanut noodles and Thai green curry. Two of those have peanuts. Your kid is allergic to peanuts. You had pasta on Tuesday. You are trying to eat less carbs. Your partner works late on Fridays so dinner needs to keep on the stove. The chatbot did not know any of that, because nobody told it.
A useful assistant — the kind that earns its place in your kitchen — does not just match the words "weekday dinner". It quietly carries a small, lived-in picture of you: who eats here, what you tried last week, what you said you wanted to cut down on, what your week looks like right now. The recipe list is just the last 10% of the answer. The first 90% is knowing what to leave out.
This is the scene where most people first notice the difference between "a chatbot" and "an assistant". A chatbot is the search bar with a friendlier mouth. An assistant has a sense of you.
Under the hood
Each memory row carries a cognitive_kind tag — episodic ("what I did"), semantic ("what is true"), procedural ("how to do it"), judgment ("what I decided"). Retrieval reranks differently for each: a "what should I cook" turn weights episodic recency (Tuesday's pasta) and semantic facts (kid's allergy) very differently from "how do I deglaze a pan" (procedural). Privacy tags travel with every row — anything marked private never leaves the device, even when other layers happily sync.
Scene 03 · RAGStudying for the driving test with the right page open.
You moved countries last year. You have a driving test next month. The road code is 320 pages and most of it is irrelevant to the seven questions you actually have. You'd love to be able to ask "can I turn right on a red light?" and get back the exact sentence that decides it, with a page number.
This is exactly what people mean when they say "RAG". The acronym is ugly — retrieval-augmented generation — but the idea is calm and ordinary. You hand the assistant a stack of documents. The assistant pre-reads them: slicing each one into paragraph-sized chunks, and turning every chunk into a kind of numeric fingerprint of its meaning. Later, when you ask a question, the assistant turns your question into the same kind of fingerprint and finds the chunks whose fingerprint is nearest. Those chunks get pulled into the answer.
The difference from Scene 1 is enormous. Suddenly, the helper at your door is allowed to quote your own bookshelf. They cannot make up an article number, because if they did you would notice. And because they searched by meaning rather than by keyword, you can ask "u-turn at a red light" and still find the paragraph that uses the words "prohibited turning manoeuvres at controlled intersections". The fingerprints match. The keywords don't.
This is where many products say they have "memory". RAG is real and useful, but it is closer to a good open-book exam than a friend who knows your week. That is why Scene 4 exists.
mxbai-embed-large, and stores them in a per-shard HNSW index. Asking a question runs the same embedder over your sentence, finds the nearest chunks, and the answer comes back with the source filenames attached.Under the hood
TerranSoul's RAG path is not just vector search. Every retrieval also runs a keyword (FTS5) pass with rare-term and acronym weighting, because semantic search routinely misses exact identifiers like Section 14.2 or HS-2019. The two streams are merged with Reciprocal Rank Fusion at k=60, then optionally sharpened by a small LLM-as-judge re-ranker. On the LongMemEval-S retrieval slice this hits R@10 99.6%, which is what you want when the cost of missing the right paragraph is a wrong answer on a driving test.
Scene 04 · Connected memoryPlanning a family trip with memories connected.
It is February. Spring break is in six weeks. You ask your assistant: "plan a five-day trip for the four of us, somewhere we haven't been in the last three years, under four thousand dollars total, no flights longer than three hours, and please nothing too cold because Sam gets sick."
This is the scene where pure vector search — the bookshelf trick from Scene 3 — quietly stops working. The assistant can find a thousand articles that look like family travel writing. None of them know that you went to Aspen last year, that Sam got sick when it dropped below freezing, that four thousand dollars has to cover four flights plus a hotel, that "long flight" for a six-year-old means something different from "long flight" for you.
To answer this well, the assistant needs to follow relationships, not just match text. It needs a small private map: these places, these trips you took, these people who came with you, this budget, these things you said about each one when you came back. The fact that Sam is your child is one edge. The fact that Sam got sick in Aspen is a second edge attached to the first. The fact that the trip cost was $4,820 is a third edge. A useful answer is a small subgraph with the receipts attached — not a paragraph that vaguely sounds right.
This is connected memory. Technical people may call it "GraphRAG" or a "knowledge graph", but the everyday idea is simple: the assistant keeps a private map of thing → relationship → thing, with notes about when each one was last true. Once you have that, a six-line question with five constraints becomes a five-step walk through the map. The answer comes back as Charleston (we last went in 2019, average March temp 68°F, return flights from Boston about $1,200 for four, hotel budget allows three nights at this price) — with each fact pointing back at where in your own history it came from.
Try it below. Same constraint-heavy question. Two pipelines.
visited, brought, cost, child_of, happened_in, and so on. Edges are extracted from your own chats and documents as they come in, then resolved across name variants ("Sam" / "my son" / "the 6-year-old"). At query time, the retriever takes vector hits and walks one or two hops out — so an answer can satisfy "child-friendly", "under budget", "not last visit" all at once and still cite the source notes.Under the hood
The graph lives in plain SQLite (memory_edges: src_id, dst_id, rel_type, confidence) — no separate graph database to host. Multi-hop retrieval is exposed as multi_hop_search_memories, and the MCP tool brain_kg_neighbors lets coding agents pull a subgraph too. Conflicts ("we said Aspen was great" vs "Sam got sick there") are kept as both rows with append-only versioning, and an LLM-as-judge step resolves which fact is current rather than averaging them into mush.
Scene 05 · TerranSoul in daily useA normal Tuesday with TerranSoul holding the brain.
It is a Tuesday. You do not want to think about "base LLM", "RAG", "memory" or "graph". You want to ask the companion on your screen what changed, have your documents searched, have your preferences remembered, and have your other tools see the same decisions without re-explaining everything.
The thing that makes Tuesday actually work is invisible: all three of them are reading from the same brain. When you said on Monday "we are deprecating the v1 auth flow, do not propose patches against it", you told that to the companion. On Tuesday morning the coding agent — a completely different process, run by a different vendor — knows it. Not because it called you. Because the companion's brain is also its brain, mounted as a tool.
This is Scene 5. Not "pick a free model". Not "pay for a better model". Not even "build a RAG over my notes". The grown-up version of personal AI is a brain you own, and a companion that can lend it to the apps you already use. The apps can change over time. The brain stays.
For this to actually work on a normal Tuesday — not in a demo — four things have to be true at once:
| What has to hold | Why it matters | How TerranSoul does it |
|---|---|---|
| Fast | If the companion takes 6 seconds to look up "what did we deprecate?", you stop using it. | Sharded vector index, RRF fusion, query-class HyDE, hot caches. Sub-second on millions of memories. |
| Consistent | The IDE agent and the 3D companion must agree on what is current. Otherwise you fix the same bug twice. | Append-only versioning + LLM-as-judge conflict resolution + per-memory CAP profile (consensus on legal/financial, eventual on scratch). |
| Yours | If the brain only lives in someone else's cloud, it isn't really your memory — it is theirs, on loan. | Local-first by default. CRDT sync to your own devices. Optional encrypted relay for partners or teammates. Per-memory privacy ACL. |
| Measured | "It feels like it remembers" is not enough. You need to know when it doesn't. | Public benchmark harness (LongMemEval-S, LoCoMo MTEB, agent-memory token-efficiency), telemetry on retrieval health, uptime SLO. |
Under the hood
The MCP server runs on 127.0.0.1:7423 by default (or :7421 when the desktop app is open). Agents talk to it over a small set of tools: brain_search, brain_kg_neighbors, brain_ingest_lesson, code_query, code_impact, plus 30 more. Bearer-token authentication; tokens auto-rotate per session. The same surface works for an isolated CI container — useful when you want a coding agent to share a brain without opening the 3D app at all.
ConclusionSo which one do I actually need?
For one-off work — drafting an email, summarising a PDF, kicking around an idea — a plain chatbot or hosted AI platform can be genuinely excellent. Open it, do the thing, close the tab. That is a real category, and not the one this article is about.
The moment any of the following is true, you have moved beyond "just a model" and into TerranSoul territory:
- You catch yourself re-explaining the same project context to a chatbot for the third time this month.
- You have documents that should never leave your machine, but you still want to ask questions of them.
- You use a coding agent (Copilot, Cursor, Claude Code, Codex, OpenClaw) and you want it to remember decisions across sessions, not start fresh every morning.
- You want your assistant to know things like "kid's allergy", "trip last year", "we deprecated v1 on Monday" without you re-typing them every time.
- You want one brain across your laptop, your desktop, and your partner's machine — without uploading everything to a vendor.
And the place to start is small. The Brain + RAG tutorial walks you through the exact flow from Scene 3 — drop a few documents, ask a question in your language, get an answer with citations — in about ten minutes. Once that works, personal memory and connected memory become easier to understand because they build on the same idea: the model answers better when the right context is brought to it.
How the friends fit in
This part is worth saying out loud, because the cast in the intro is genuinely confusing. The short version:
| App | What it's great at | How TerranSoul composes with it |
|---|---|---|
| ChatGPT | A broad AI platform — chat, voice, Canvas, Codex for engineering, custom GPTs, Operator for agentic tasks, and built-in Memory. | Independent and powerful on its own. TerranSoul can call OpenAI models as a brain backend and can share notes/decisions with ChatGPT workflows through MCP-style connectors. |
| Claude | A full ecosystem: Claude chat, Claude Code (terminal/IDE coding), Cowork shared spaces, Artifacts, Projects, and a strong MCP story. | Independent and powerful on its own. TerranSoul speaks MCP, so Claude Code can mount the TerranSoul brain as a tool when you want shared project memory across machines. |
| OpenClaw | An open coding-agent UX with messaging bridges — reads files, edits files, runs commands, and can be reached from chat apps and webhooks. | Composes naturally. OpenClaw can plug into TerranSoul's MCP brain for project memory, semantic search and code intelligence. |
| Hermes Agent | A self-improving CLI agent with sessions, skills, scheduling, 14 toolsets, 16 messaging gateways, and a desktop GUI. | Composes naturally — TerranSoul is mounted as a first-class MCP server in Hermes's config when you want shared memory. |
| TerranSoul | One specific job: a brain, a persona, a 3D companion, plus the privacy and sync layer that lets the brain travel with you. | Lends itself to all of the above, so a decision you made yesterday in Claude Code is visible to OpenClaw or Hermes tomorrow. |
None of these tools is competing for the same square inch of your desk. ChatGPT and Claude are remarkable AI platforms. Claude Code and OpenClaw are great at writing and editing software. Hermes is great at long, scheduled, agentic work. TerranSoul is doing one specific job they all leave open: turning base LLMs, RAG and memory into a structured brain that remains yours past month three. The interesting bit is not whether the model is free, paid or local — it is the discipline around what to remember, what to forget, and who is allowed to see it.