Context as a Compiled Artefact
TL;DR
- The dominant context strategies (chunk + embed + top-k + stuff, or "the model has a long context so just give it the files") both treat context assembly as a retrieval problem whose output is a bag of tokens.
- That approach throws away the structure required for any of the deterministic properties we actually need: provenance chains, policy enforcement that can see combinations, contradiction visibility, and safe reuse across time and requesters.
- A proper context compiler lowers through explicit intermediate representations (L0–L3) with invariant checks at every boundary. The artefact is the compiled bundle + portable envelope (ContextAbi), not the prompt or the rendered report.
- "The prompt is not the artefact" is the central design rule. Everything else (caching, governance, portability, audit) follows from treating the compiled form as primary.
The phrase "context engineering" is having a moment. Most of what travels under that label is still retrieval engineering with extra steps.
The Two Popular Failure Modes
Mode 1: Classical RAG. Chunk the corpus. Embed. Retrieve top-k by similarity (or hybrid keyword). Concatenate. Prompt. The model does its best with a lossy, unordered, de-provenanced bag of fragments.
This works for "find me something like the thing I described." It is structurally incapable of answering "what is the current state of this claim across all the sources that mention it, what are the contradictions, and which of those sources am I even allowed to see for this purpose?"
Mode 2: "The model is the context layer." Give the model access to files, previous outputs, a wiki, a long conversation, or a vector store and tell it to figure out what matters. Sometimes with a bit of summarization or "memory" compaction.
This moves the loss and the de-provenancing inside the model. The model becomes responsible for maintaining the invariants it cannot see. Over time the context becomes sludge: statements whose original grounding has been lost, claims that were scoped to a previous state of the world, contradictions that were resolved once in the model's head and then forgotten.
Both modes are attractive because they are easy to stand up and they produce fluent output quickly. Both are the wrong abstraction if the goal is work that remains trustworthy when the world (or the policy, or the branch, or the team) changes.
Visual Comparison: RAG vs Compiled Context
To make the difference concrete, here is a quadrant view of where different approaches land on agentic effectiveness versus context rot:
Loading diagram…
And the pipeline view that explains why the positions are so different:
Loading diagram…
In the RAG path every arrow is a rot vector (lost structure, lost provenance, policy only as an afterthought, contradictions invisible).
In the compiled path those concerns are resolved while the graph is still explicit, recorded in the artefact (the ContextAbi), and the model is handed a much smaller, much more trustworthy package.
Compilation as Lowering
The alternative is to treat context assembly as a compilation problem with typed intermediate representations and explicit passes that must preserve or explicitly record what is being lost.
A working model (the one implemented in the kernel):
-
L0 — Source IR: Lossless (as practical) typed ingestion of the raw material. Documents, tool outputs, previous compiled artefacts, web captures, etc. as
SourceArtifacts. Structure that can be mechanically recovered later is kept. -
L1 — Semantic IR: A graph of claims, entities, and relationships. Every node carries a
SourceSpanback to the L0 material that grounds it. Contradictions are explicit edges, not silently merged text. This is where most of the expensive extraction happens. -
L2 — Task IR: The semantic material conditioned on a specific intent. Evidence packs are assembled, mandatory nodes surfaced, unresolved questions called out, and — crucially — governance is applied here. The governance pass runs on the explicit graph before the packs are finalized. Redactions are typed and provenance-preserving. The result is not "the same graph with some things deleted from the prompt"; it is a task-specific view that still knows what was filtered and why.
-
L3 — Execution IR: The model-ready form. Ordered blocks, output contract, optional tool plan. This is what gets rendered into a prompt or a
ReportDocument. The render is below the artefact boundary.
Between each level (and within levels at pass boundaries) the compiler runs invariant checks. The taxonomy of breaks is useful: provenance break, structure break, reattachment break, modality break, policy_timing break. A policy_timing break is exactly what you create when you do governance after the context has already been assembled for the model.
The output of a successful compilation is not a prompt. It is a CompiledContextBundle wrapped in a ContextAbi envelope that carries the Header, the Policy block (the self-describing record of what governance was performed), the Evidence, the Execution contract, and an Integrity block (content hash over the canonical serialization, plus pass audit trail).
Prompts and human-readable reports are renders of L3 under a specific OutputContract. They are not the thing you cache, port, or audit against.
Why This Shape
Several properties only become possible once you accept the compilation model:
- Governance of evidence can be deterministic and auditable rather than "the model tried not to mention the confidential parts."
- Caching and reuse can be safe when you include applicability (entitlement class + policy version + sources + spec) in the identity. The content hash is an integrity fingerprint of one assembled instance, not a reuse key.
- Portability across models, sessions, and even products becomes a real contract (the ABI) instead of "we both call the same embedding API."
- Audit and provenance can be mechanical. A later action can name the exact
ContextAbihash it was based on. Future compilation can see the chain.
None of these require the model to be worse. They require the model to be given material that has already been through the parts of the problem the model is bad at.
The Current Industry Tells Itself a Story
The story is: "retrieval will get better, context windows will get larger, the model will become better at long-context reasoning and self-critique, so the need for explicit structure will fade."
That story is convenient for people selling bigger context windows and better embedding models. It is also false for the properties that matter once you are doing work whose consequences last longer than one demo.
Better retrieval still produces de-provenanced fragments. Larger context still requires the model to maintain the invariants internally. Self-critique still depends on the model noticing what it has forgotten or what has become invalid since the last time it looked.
The compilation approach does not compete with those improvements. It is the layer that makes the improvements usable inside a system that has to answer for its behaviour later.
Next: once you have reliable compiled context and the two governance layers, the unit of execution itself has to change. Long-running ambient agents with accumulating chat history are the wrong shape for work you want to be able to explain, resume, and improve over time.
Part 4 of "The Deterministic Shell."
The kernel that performs this lowering is deliberately product-neutral so the same substrate can feed both one-off deep research reports and the context packs given to bounded jobs in a delivery control plane. The IR snapshots, the pass records, and the ABI are the durable artefacts. Everything else is a view.