The Case for Incremental Knowledge Graphs

The dominant approach to AI knowledge management—large embedding models that encode everything into high-dimensional vectors—has a fundamental flaw: it's a black box. You can't inspect why the model thinks two concepts are related, verify the accuracy of stored knowledge, or surgically update incorrect information.

There's a better way: incremental knowledge graphs built from atomic knowledge units.

The Embedding Problem

Vector embeddings are powerful but opaque. When you embed a document, you get a list of numbers that capture... something. Semantic similarity? Topical relevance? It's hard to say, and impossible to verify.

# What does this actually mean?
embedding = model.encode("The mitochondria is the powerhouse of the cell")
# [0.023, -0.841, 0.156, ..., 0.492]  # ???

This opacity creates problems:

No auditability: Why did the model retrieve this document?
No partial updates: Change one fact, re-embed everything
No confidence scores: How certain is this knowledge?
No source tracking: Where did this information come from?

Atomic Knowledge Units

Instead of monolithic embeddings, we build knowledge from atomic units:

interface AtomicFact {
  id: string
  subject: Entity
  predicate: Relation
  object: Entity | Literal
  confidence: number
  sources: Citation[]
  extractedAt: Timestamp
  verifications: []
}

Each fact is:

Individually verifiable: Check against sources
Independently updatable: Change one fact without affecting others
Explicitly sourced: Full citation chain
Confidence-scored: Quantified uncertainty

The Incremental Approach

Traditional knowledge graphs are built in batch: process all documents, extract all entities, build all relationships. This is expensive and doesn't scale.

Incremental knowledge graphs grow organically:

async function processDocument(doc: Document): <[]> {
  
   facts =  (doc)
  
  
   ( fact  facts) {
     existing =  (fact)
     (fact, existing)
  }
  
  
   (facts)
  
   facts
}

Benefits:

Real-time updates: New knowledge available immediately
Bounded compute: Process one document at a time
Progressive refinement: Confidence increases with corroboration

Verification and Trust

The killer feature of atomic knowledge is verifiability:

  (): <> {
  
   sourceCheck =  (fact)
  
  
   corroboration =  (fact)
  
  
   contradictions =  (fact)
  
   {
    : sourceCheck.,
    : corroboration. / ,
    : contradictions,
    : (sourceCheck, corroboration, contradictions)
  }
}

Users can inspect the verification chain and understand why the system believes what it believes.

Hybrid Architecture

In practice, we use a hybrid approach:

Atomic facts for structured knowledge: Things that can be verified
Embeddings for fuzzy retrieval: Finding relevant context
Explicit links between them: Best of both worlds

┌─────────────────────────────────────────┐
│           Query Understanding           │
└─────────────────────────────────────────┘
                    │
        ┌───────────┴───────────┐
        ▼                       ▼
┌───────────────┐       ┌───────────────┐
│   Embedding   │       │   Knowledge   │
│   Retrieval   │◄─────►│     Graph     │
└───────────────┘       └───────────────┘
        │                       │
        └───────────┬───────────┘
                    ▼
┌─────────────────────────────────────────┐
│         Unified Knowledge Layer         │
│   (Fuzzy similarity + Verified facts)   │
└─────────────────────────────────────────┘

Results

Switching to incremental atomic knowledge graphs gave us:

90% reduction in knowledge update latency: Real-time vs. batch
Verifiable answers: Full citation chains for every response
Surgical corrections: Fix one fact, not the whole model
User trust: People can see why the system believes things

The future of AI knowledge isn't bigger embedding models—it's structured, verifiable, atomic knowledge that humans can understand and trust.

There's a better way: incremental knowledge graphs built from atomic knowledge units.

The Embedding Problem

# What does this actually mean?
embedding = model.encode("The mitochondria is the powerhouse of the cell")
# [0.023, -0.841, 0.156, ..., 0.492]  # ???

This opacity creates problems:

No auditability: Why did the model retrieve this document?
No partial updates: Change one fact, re-embed everything
No confidence scores: How certain is this knowledge?
No source tracking: Where did this information come from?

Atomic Knowledge Units

Instead of monolithic embeddings, we build knowledge from atomic units:

interface AtomicFact {
  id: string
  subject: Entity
  predicate: Relation
  object: Entity | Literal
  confidence: number
  sources: Citation[]
  extractedAt: Timestamp
  verifications: []
}

Each fact is:

Individually verifiable: Check against sources
Independently updatable: Change one fact without affecting others
Explicitly sourced: Full citation chain
Confidence-scored: Quantified uncertainty

The Incremental Approach

Traditional knowledge graphs are built in batch: process all documents, extract all entities, build all relationships. This is expensive and doesn't scale.

Incremental knowledge graphs grow organically:

async function processDocument(doc: Document): <[]> {
  
   facts =  (doc)
  
  
   ( fact  facts) {
     existing =  (fact)
     (fact, existing)
  }
  
  
   (facts)
  
   facts
}

Benefits:

Real-time updates: New knowledge available immediately
Bounded compute: Process one document at a time
Progressive refinement: Confidence increases with corroboration

Verification and Trust

The killer feature of atomic knowledge is verifiability:

  (): <> {
  
   sourceCheck =  (fact)
  
  
   corroboration =  (fact)
  
  
   contradictions =  (fact)
  
   {
    : sourceCheck.,
    : corroboration. / ,
    : contradictions,
    : (sourceCheck, corroboration, contradictions)
  }
}

Users can inspect the verification chain and understand why the system believes what it believes.

Hybrid Architecture

In practice, we use a hybrid approach:

Atomic facts for structured knowledge: Things that can be verified
Embeddings for fuzzy retrieval: Finding relevant context
Explicit links between them: Best of both worlds

┌─────────────────────────────────────────┐
│           Query Understanding           │
└─────────────────────────────────────────┘
                    │
        ┌───────────┴───────────┐
        ▼                       ▼
┌───────────────┐       ┌───────────────┐
│   Embedding   │       │   Knowledge   │
│   Retrieval   │◄─────►│     Graph     │
└───────────────┘       └───────────────┘
        │                       │
        └───────────┬───────────┘
                    ▼
┌─────────────────────────────────────────┐
│         Unified Knowledge Layer         │
│   (Fuzzy similarity + Verified facts)   │
└─────────────────────────────────────────┘

Results

Switching to incremental atomic knowledge graphs gave us:

90% reduction in knowledge update latency: Real-time vs. batch
Verifiable answers: Full citation chains for every response
Surgical corrections: Fix one fact, not the whole model
User trust: People can see why the system believes things

The future of AI knowledge isn't bigger embedding models—it's structured, verifiable, atomic knowledge that humans can understand and trust.

The Case for Incremental Knowledge Graphs

The Embedding Problem

Atomic Knowledge Units

The Incremental Approach

Verification and Trust

Hybrid Architecture

Results

Matthew Gribben

The Case for Incremental Knowledge Graphs

The Embedding Problem

Atomic Knowledge Units

The Incremental Approach

Verification and Trust

Hybrid Architecture

Results

Matthew Gribben