claudia.keller/openclaw-knowledge-engine

OpenClaw plugin for real-time knowledge extraction

Find a file

Claudia 086f3ff0b3 docs: comprehensive README, LICENSE, community-ready - Full README with: what it does, quick start, config reference, architecture overview, extraction pipeline diagram, fact lifecycle, hook table, testing instructions, plugin suite roadmap - MIT License - Repository URL fixed in package.json		2026-02-17 16:27:29 +01:00
src	fix: debounce race condition in tests, .catch() on LLM batch	2026-02-17 16:20:05 +01:00
test	fix: debounce race condition in tests, .catch() on LLM batch	2026-02-17 16:20:05 +01:00
.gitignore	feat: knowledge-engine v0.1.0 — all Cerberus findings fixed	2026-02-17 16:10:13 +01:00
ARCHITECTURE.md	feat: knowledge-engine v0.1.0 — all Cerberus findings fixed	2026-02-17 16:10:13 +01:00
index.ts	feat: knowledge-engine v0.1.0 — all Cerberus findings fixed	2026-02-17 16:10:13 +01:00
LICENSE	docs: comprehensive README, LICENSE, community-ready	2026-02-17 16:27:29 +01:00
openclaw.plugin.json	feat: knowledge-engine v0.1.0 — all Cerberus findings fixed	2026-02-17 16:10:13 +01:00
package-lock.json	feat: knowledge-engine v0.1.0 — all Cerberus findings fixed	2026-02-17 16:10:13 +01:00
package.json	docs: comprehensive README, LICENSE, community-ready	2026-02-17 16:27:29 +01:00
README.md	docs: comprehensive README, LICENSE, community-ready	2026-02-17 16:27:29 +01:00
tsconfig.json	feat: knowledge-engine v0.1.0 — all Cerberus findings fixed	2026-02-17 16:10:13 +01:00

README.md

@vainplex/openclaw-knowledge-engine

A real-time knowledge extraction plugin for OpenClaw. Automatically extracts entities, facts, and relationships from conversations — building a persistent, queryable knowledge base that grows with every message.

What it does

Every message your OpenClaw agent processes flows through the Knowledge Engine:

Regex Extraction (instant, zero cost) — Detects people, organizations, technologies, URLs, emails, and other entities using pattern matching
LLM Enhancement (optional, batched) — Groups messages and sends them to a local LLM for deeper entity and fact extraction
Fact Storage — Stores extracted knowledge as structured subject-predicate-object triples with relevance scoring
Relevance Decay — Automatically decays old facts so recent knowledge surfaces first
Vector Sync — Optionally syncs facts to ChromaDB for semantic search
Background Maintenance — Prunes low-relevance facts, compacts storage, runs cleanup

User: "We're meeting with Sebastian from Mondo Gate next Tuesday"
  │
  ├─ Regex → entities: [Sebastian (person), Mondo Gate (organization)]
  └─ LLM   → facts:   [Sebastian — works-at — Mondo Gate]
                       [Meeting — scheduled-with — Mondo Gate]

Quick Start

1. Install

cd ~/.openclaw
npm install @vainplex/openclaw-knowledge-engine

2. Sync to extensions

OpenClaw loads plugins from the extensions/ directory:

mkdir -p extensions/openclaw-knowledge-engine
cp -r node_modules/@vainplex/openclaw-knowledge-engine/{dist,package.json,openclaw.plugin.json} extensions/openclaw-knowledge-engine/

3. Configure

Add to your openclaw.json:

{
  "plugins": {
    "entries": {
      "openclaw-knowledge-engine": {
        "enabled": true,
        "config": {
          "workspace": "/path/to/your/workspace",
          "extraction": {
            "regex": { "enabled": true },
            "llm": {
              "enabled": true,
              "endpoint": "http://localhost:11434/api/generate",
              "model": "mistral:7b",
              "batchSize": 10,
              "cooldownMs": 30000
            }
          }
        }
      }
    }
  }
}

4. Restart gateway

openclaw gateway restart

Configuration

Key	Type	Default	Description
`enabled`	boolean	`true`	Enable/disable the plugin
`workspace`	string	`~/.clawd/plugins/knowledge-engine`	Storage directory for knowledge files
`extraction.regex.enabled`	boolean	`true`	High-speed regex entity extraction
`extraction.llm.enabled`	boolean	`true`	LLM-based deep extraction
`extraction.llm.model`	string	`"mistral:7b"`	Ollama/OpenAI-compatible model
`extraction.llm.endpoint`	string	`"http://localhost:11434/api/generate"`	LLM API endpoint (HTTP or HTTPS)
`extraction.llm.batchSize`	number	`10`	Messages per LLM batch
`extraction.llm.cooldownMs`	number	`30000`	Wait time before sending batch
`decay.enabled`	boolean	`true`	Periodic relevance decay
`decay.intervalHours`	number	`24`	Hours between decay cycles
`decay.rate`	number	`0.02`	Decay rate per interval (2%)
`embeddings.enabled`	boolean	`false`	Sync facts to ChromaDB
`embeddings.endpoint`	string	`"http://localhost:8000/..."`	ChromaDB API endpoint
`embeddings.collectionName`	string	`"openclaw-facts"`	Vector collection name
`embeddings.syncIntervalMinutes`	number	`15`	Minutes between vector syncs
`storage.maxEntities`	number	`5000`	Max entities before pruning
`storage.maxFacts`	number	`10000`	Max facts before pruning
`storage.writeDebounceMs`	number	`15000`	Debounce delay for disk writes

Minimal config (regex only, no LLM)

{
  "openclaw-knowledge-engine": {
    "enabled": true,
    "config": {
      "extraction": {
        "llm": { "enabled": false }
      }
    }
  }
}

This gives you zero-cost entity extraction with no external dependencies.

Full config (LLM + ChromaDB)

{
  "openclaw-knowledge-engine": {
    "enabled": true,
    "config": {
      "workspace": "~/my-agent/knowledge",
      "extraction": {
        "llm": {
          "enabled": true,
          "endpoint": "http://localhost:11434/api/generate",
          "model": "mistral:7b"
        }
      },
      "embeddings": {
        "enabled": true,
        "endpoint": "http://localhost:8000/api/v1/collections/facts/add"
      },
      "decay": {
        "intervalHours": 12,
        "rate": 0.03
      }
    }
  }
}

How it works

Extraction Pipeline

Message received
      │
      ├──▶ Regex Engine (sync, <1ms)
      │     └─ Extracts: proper nouns, organizations, tech terms,
      │        URLs, emails, monetary amounts, dates
      │
      └──▶ LLM Batch Queue (async, batched)
            └─ Every N messages or after cooldown:
               └─ Sends batch to local LLM
               └─ Extracts: entities + fact triples
               └─ Stores in FactStore

Fact Lifecycle

Facts are stored as structured triples:

{
  "id": "f-abc123",
  "subject": "Sebastian",
  "predicate": "works-at",
  "object": "Mondo Gate",
  "source": "extracted-llm",
  "relevance": 0.95,
  "createdAt": 1707123456789,
  "lastAccessedAt": 1707123456789
}

Relevance starts at 1.0 and decays over time
Accessed facts get a relevance boost (LRU-style)
Pruning removes facts below the relevance floor when storage limits are hit
Minimum floor (0.1) prevents complete decay — old facts never fully disappear

Storage

All data is persisted as JSON files in your workspace:

workspace/
├── entities.json    # Extracted entities with types and counts
└── facts.json       # Fact triples with relevance scores

Writes use atomic file operations (write to .tmp, then rename) to prevent corruption.

Architecture

index.ts                 → Plugin entry point
src/
├── types.ts             → All TypeScript interfaces
├── config.ts            → Config resolution + validation
├── patterns.ts          → Regex factories (Proxy-based, no /g state bleed)
├── entity-extractor.ts  → Regex-based entity extraction
├── llm-enhancer.ts      → Batched LLM extraction with cooldown
├── fact-store.ts        → In-memory fact store with decay + pruning
├── hooks.ts             → OpenClaw hook registration + orchestration
├── http-client.ts       → Shared HTTP/HTTPS transport
├── embeddings.ts        → ChromaDB vector sync
├── storage.ts           → Atomic JSON I/O with debounce
└── maintenance.ts       → Scheduled background tasks

12 modules, each with a single responsibility
Zero runtime dependencies — Node.js built-ins only
TypeScript strict — no any in source code
All functions ≤40 lines

Hooks

Hook	Priority	Description
`session_start`	200	Loads fact store from disk
`message_received`	100	Extracts entities + queues LLM batch
`message_sent`	100	Same extraction on outbound messages
`gateway_stop`	50	Flushes writes, stops timers

Testing

npm test
# Runs 83 tests across 10 test files

Tests cover: config validation, entity extraction, fact CRUD, decay, pruning, LLM batching, HTTP client, embeddings, storage atomicity, maintenance scheduling, hook orchestration.

Part of the Darkplex Plugin Suite

#	Plugin	Status	Description
1	@vainplex/nats-eventstore	✅ Published	NATS JetStream event persistence
2	@vainplex/openclaw-cortex	✅ Published	Conversation intelligence (threads, decisions, boot context)
3	@vainplex/openclaw-knowledge-engine	✅ Published	Real-time knowledge extraction (this plugin)
4	@vainplex/openclaw-governance	📋 Planned	Policy enforcement + guardrails
5	@vainplex/openclaw-memory-engine	📋 Planned	Unified memory layer
6	@vainplex/openclaw-health-monitor	📋 Planned	System health + auto-healing

License

MIT