LLMs Are Not Control Planes
TL;DR
- Popular patterns in 2026 treat the LLM (or a thin wrapper around it) as the thing that maintains the system: the workflow, the memory model, the policy, the long-term truth.
- These patterns are not early versions of the correct architecture. They are a different architecture that is excellent at short-term coherence and poor at everything that compounds.
- OpenClaw-style "skills in markdown + long-running agent loop" and Karpathy-style LLM wikis are the clearest current examples. Both are directionally interesting. Both are missing the deterministic layer by construction.
- The rot is not a bug you will prompt away. It is what happens when you ask a stochastic system to perform the job of a type system, a transaction boundary, a policy engine, and an audit log.
The most common architectures for "serious agentic work" right now have a striking property in common: the model is doing the systems programming.
You see it in different flavors.
The Markdown Operating System
One popular family looks roughly like this:
- A long-running process (often Node) that maintains connections to channels.
- A collection of "skills" or "workflows" written primarily in markdown.
- An agentic loop that loads relevant markdown, stuffs it into context along with history and tool results, lets the model decide what to do next, and repeats.
- Some amount of human approval for risky actions, usually implemented as "pause and ask" rather than as a typed gate the system can enforce.
The appeal is obvious. You can iterate on behaviour by editing text files. The model is flexible. New capabilities feel like they can be added by writing another well-commented markdown document and registering it somewhere.
The problem is that the flow itself — the thing that is supposed to be reliable across time, across different inputs, across changes in the underlying tools — lives in the model's ability to interpret the comments, remember the conventions, and not drift on the tenth iteration of a long task.
This is not orchestration. This is vibe-based control flow with a very smart intern.
When the markdown says "after you have the design, update the ticket and then open the PR," the system has no first-class representation of "design complete," "ticket update," or "PR opened" that exists independently of the model emitting the right tokens in the right order. There is no contract on the ticket update tool that the rest of the system can check before or after. The approval, if it exists, is a conversation the model participates in rather than a decision the surrounding system records and enforces.
This is exactly what people mean when they say a system "worked yesterday." The model was in the right headspace with the right recent context. The markdown had not yet rotted relative to the actual tool surface or the actual policy.
The LLM Wiki
Another pattern that feels more sophisticated is the accumulating knowledge base powered primarily by the model:
- Everything the system touches gets written down or embedded.
- Later work retrieves from that growing corpus.
- The model is expected to synthesize, notice contradictions, update old entries when they are wrong, and generally act as both the writer and the librarian of the institutional memory.
This is closer to the right direction than pure chat. Continuous ingestion and some form of long-term memory are necessary.
It still collapses the deterministic requirements into the stochastic component.
There is usually no typed distinction between "this is a raw observation from a run," "this is a human-ratified decision," and "this is a fact we have decided is true under a specific applicability scope." Provenance is often "the model said it came from these sources" rather than a preserved chain that can be mechanically audited. Contradictions are resolved when the model feels like surfacing them. Applicability (this was true on the auth-service refactor branch before the merge, this package constraint only holds for v2.x consumers) is either absent or treated as another fact the model should remember to check.
The result is a body of text that looks increasingly authoritative while its actual correspondence to reality decays. The more successful the accumulation, the faster the rot becomes dangerous, because future work trusts the corpus more.
Karpathy's LLM wiki explorations are a clean example of the genre. The impulse — give the model a durable, queryable place to keep what it has learned — is correct. The execution still leaves the integrity, typing, scoping, and verification of that place to the same mechanism that is great at fluent synthesis and poor at remembering that a constraint only applied on a particular commit.
The Deeper Pattern
These are not isolated bad choices. They are what you get when you start from "how do I make the model do more impressive things" rather than "what properties must the system around the model have if the work is going to remain legible and safe six months from now."
The model ends up responsible for:
- Control flow and sequencing (instead of explicit, bounded jobs with dependencies and readiness rules).
- Memory model integrity (instead of typed memory with promotion gates and applicability).
- Policy evaluation (instead of a shell that can evaluate policy before the model is even shown the option).
- Provenance and audit (instead of an execution graph that records what actually happened independently of what the model claims happened).
When any of these drift — and they will, because the model is stochastic — the entire edifice becomes less trustworthy in exactly the places where trust compounds or destroys value.
What the Alternative Looks Like
The deterministic shell does not remove the model from the interesting parts. It removes the model from the parts it should never have been asked to own.
In the systems we have been building, the model still does the synthesis, the proposal, the extraction under ambiguity, the creative leaps. What it does not do is:
- Define its own units of work.
- Assemble its own context without going through a compilation process that can enforce invariants.
- Execute actions without the surrounding system having already validated the call against a real contract and current policy.
- Promote its own conclusions into durable memory without a gate.
- Act as the source of truth for what was true, when, and under what conditions.
Those responsibilities live in typed aggregates, lowering pipelines with graph-break checks, first-class confirmation and governance decision records, explicit run/step/artefact evidence, and applicability-scoped memory promotion.
The markdown file becomes documentation or a high-level intent description, not the executable definition of the flow. The wiki becomes one possible consumer of promoted memory, not the primary store. The loop is a runner over bounded, auditable work items rather than a single long-lived "agent" whose state is its conversation history.
This is more work to build. It is also the only shape of system I have seen that does not require the model to stay in exactly the right mood for the properties you actually care about to hold.
Next: the two layers of governance that most current designs conflate, and why keeping them separate is foundational.
Part 2 of "The Deterministic Shell."
The concrete counter-examples are the context compilation kernel (where governance of evidence happens at compile time with provenance and sensitivity preserved) and the control plane that treats Jobs as bounded executable units, Runs as evidence, and memory promotion as an explicit, gated act rather than an ambient side effect of the model writing things down.