- Full README with: what it does, quick start, config reference, architecture overview, extraction pipeline diagram, fact lifecycle, hook table, testing instructions, plugin suite roadmap - MIT License - Repository URL fixed in package.json |
||
|---|---|---|
| src | ||
| test | ||
| .gitignore | ||
| ARCHITECTURE.md | ||
| index.ts | ||
| LICENSE | ||
| openclaw.plugin.json | ||
| package-lock.json | ||
| package.json | ||
| README.md | ||
| tsconfig.json | ||
@vainplex/openclaw-knowledge-engine
A real-time knowledge extraction plugin for OpenClaw. Automatically extracts entities, facts, and relationships from conversations — building a persistent, queryable knowledge base that grows with every message.
What it does
Every message your OpenClaw agent processes flows through the Knowledge Engine:
- Regex Extraction (instant, zero cost) — Detects people, organizations, technologies, URLs, emails, and other entities using pattern matching
- LLM Enhancement (optional, batched) — Groups messages and sends them to a local LLM for deeper entity and fact extraction
- Fact Storage — Stores extracted knowledge as structured subject-predicate-object triples with relevance scoring
- Relevance Decay — Automatically decays old facts so recent knowledge surfaces first
- Vector Sync — Optionally syncs facts to ChromaDB for semantic search
- Background Maintenance — Prunes low-relevance facts, compacts storage, runs cleanup
User: "We're meeting with Sebastian from Mondo Gate next Tuesday"
│
├─ Regex → entities: [Sebastian (person), Mondo Gate (organization)]
└─ LLM → facts: [Sebastian — works-at — Mondo Gate]
[Meeting — scheduled-with — Mondo Gate]
Quick Start
1. Install
cd ~/.openclaw
npm install @vainplex/openclaw-knowledge-engine
2. Sync to extensions
OpenClaw loads plugins from the extensions/ directory:
mkdir -p extensions/openclaw-knowledge-engine
cp -r node_modules/@vainplex/openclaw-knowledge-engine/{dist,package.json,openclaw.plugin.json} extensions/openclaw-knowledge-engine/
3. Configure
Add to your openclaw.json:
{
"plugins": {
"entries": {
"openclaw-knowledge-engine": {
"enabled": true,
"config": {
"workspace": "/path/to/your/workspace",
"extraction": {
"regex": { "enabled": true },
"llm": {
"enabled": true,
"endpoint": "http://localhost:11434/api/generate",
"model": "mistral:7b",
"batchSize": 10,
"cooldownMs": 30000
}
}
}
}
}
}
}
4. Restart gateway
openclaw gateway restart
Configuration
| Key | Type | Default | Description |
|---|---|---|---|
enabled |
boolean | true |
Enable/disable the plugin |
workspace |
string | ~/.clawd/plugins/knowledge-engine |
Storage directory for knowledge files |
extraction.regex.enabled |
boolean | true |
High-speed regex entity extraction |
extraction.llm.enabled |
boolean | true |
LLM-based deep extraction |
extraction.llm.model |
string | "mistral:7b" |
Ollama/OpenAI-compatible model |
extraction.llm.endpoint |
string | "http://localhost:11434/api/generate" |
LLM API endpoint (HTTP or HTTPS) |
extraction.llm.batchSize |
number | 10 |
Messages per LLM batch |
extraction.llm.cooldownMs |
number | 30000 |
Wait time before sending batch |
decay.enabled |
boolean | true |
Periodic relevance decay |
decay.intervalHours |
number | 24 |
Hours between decay cycles |
decay.rate |
number | 0.02 |
Decay rate per interval (2%) |
embeddings.enabled |
boolean | false |
Sync facts to ChromaDB |
embeddings.endpoint |
string | "http://localhost:8000/..." |
ChromaDB API endpoint |
embeddings.collectionName |
string | "openclaw-facts" |
Vector collection name |
embeddings.syncIntervalMinutes |
number | 15 |
Minutes between vector syncs |
storage.maxEntities |
number | 5000 |
Max entities before pruning |
storage.maxFacts |
number | 10000 |
Max facts before pruning |
storage.writeDebounceMs |
number | 15000 |
Debounce delay for disk writes |
Minimal config (regex only, no LLM)
{
"openclaw-knowledge-engine": {
"enabled": true,
"config": {
"extraction": {
"llm": { "enabled": false }
}
}
}
}
This gives you zero-cost entity extraction with no external dependencies.
Full config (LLM + ChromaDB)
{
"openclaw-knowledge-engine": {
"enabled": true,
"config": {
"workspace": "~/my-agent/knowledge",
"extraction": {
"llm": {
"enabled": true,
"endpoint": "http://localhost:11434/api/generate",
"model": "mistral:7b"
}
},
"embeddings": {
"enabled": true,
"endpoint": "http://localhost:8000/api/v1/collections/facts/add"
},
"decay": {
"intervalHours": 12,
"rate": 0.03
}
}
}
}
How it works
Extraction Pipeline
Message received
│
├──▶ Regex Engine (sync, <1ms)
│ └─ Extracts: proper nouns, organizations, tech terms,
│ URLs, emails, monetary amounts, dates
│
└──▶ LLM Batch Queue (async, batched)
└─ Every N messages or after cooldown:
└─ Sends batch to local LLM
└─ Extracts: entities + fact triples
└─ Stores in FactStore
Fact Lifecycle
Facts are stored as structured triples:
{
"id": "f-abc123",
"subject": "Sebastian",
"predicate": "works-at",
"object": "Mondo Gate",
"source": "extracted-llm",
"relevance": 0.95,
"createdAt": 1707123456789,
"lastAccessedAt": 1707123456789
}
- Relevance starts at 1.0 and decays over time
- Accessed facts get a relevance boost (LRU-style)
- Pruning removes facts below the relevance floor when storage limits are hit
- Minimum floor (0.1) prevents complete decay — old facts never fully disappear
Storage
All data is persisted as JSON files in your workspace:
workspace/
├── entities.json # Extracted entities with types and counts
└── facts.json # Fact triples with relevance scores
Writes use atomic file operations (write to .tmp, then rename) to prevent corruption.
Architecture
index.ts → Plugin entry point
src/
├── types.ts → All TypeScript interfaces
├── config.ts → Config resolution + validation
├── patterns.ts → Regex factories (Proxy-based, no /g state bleed)
├── entity-extractor.ts → Regex-based entity extraction
├── llm-enhancer.ts → Batched LLM extraction with cooldown
├── fact-store.ts → In-memory fact store with decay + pruning
├── hooks.ts → OpenClaw hook registration + orchestration
├── http-client.ts → Shared HTTP/HTTPS transport
├── embeddings.ts → ChromaDB vector sync
├── storage.ts → Atomic JSON I/O with debounce
└── maintenance.ts → Scheduled background tasks
- 12 modules, each with a single responsibility
- Zero runtime dependencies — Node.js built-ins only
- TypeScript strict — no
anyin source code - All functions ≤40 lines
Hooks
| Hook | Priority | Description |
|---|---|---|
session_start |
200 | Loads fact store from disk |
message_received |
100 | Extracts entities + queues LLM batch |
message_sent |
100 | Same extraction on outbound messages |
gateway_stop |
50 | Flushes writes, stops timers |
Testing
npm test
# Runs 83 tests across 10 test files
Tests cover: config validation, entity extraction, fact CRUD, decay, pruning, LLM batching, HTTP client, embeddings, storage atomicity, maintenance scheduling, hook orchestration.
Part of the Darkplex Plugin Suite
| # | Plugin | Status | Description |
|---|---|---|---|
| 1 | @vainplex/nats-eventstore | ✅ Published | NATS JetStream event persistence |
| 2 | @vainplex/openclaw-cortex | ✅ Published | Conversation intelligence (threads, decisions, boot context) |
| 3 | @vainplex/openclaw-knowledge-engine | ✅ Published | Real-time knowledge extraction (this plugin) |
| 4 | @vainplex/openclaw-governance | 📋 Planned | Policy enforcement + guardrails |
| 5 | @vainplex/openclaw-memory-engine | 📋 Planned | Unified memory layer |
| 6 | @vainplex/openclaw-health-monitor | 📋 Planned | System health + auto-healing |
License
MIT