OpenClaw plugin for real-time knowledge extraction
Find a file
2026-02-17 18:17:20 +01:00
src fix: deployment fixes (openclaw.id, configSchema, resolveConfig) 2026-02-17 18:17:20 +01:00
test fix: anonymize all examples and test data 2026-02-17 16:29:44 +01:00
.gitignore feat: knowledge-engine v0.1.0 — all Cerberus findings fixed 2026-02-17 16:10:13 +01:00
ARCHITECTURE.md feat: knowledge-engine v0.1.0 — all Cerberus findings fixed 2026-02-17 16:10:13 +01:00
index.ts fix: deployment fixes (openclaw.id, configSchema, resolveConfig) 2026-02-17 18:17:20 +01:00
LICENSE fix: anonymize all examples and test data 2026-02-17 16:29:44 +01:00
openclaw.plugin.json fix: deployment fixes (openclaw.id, configSchema, resolveConfig) 2026-02-17 18:17:20 +01:00
package-lock.json chore: bump v0.1.2 (anonymized) 2026-02-17 16:29:59 +01:00
package.json fix: deployment fixes (openclaw.id, configSchema, resolveConfig) 2026-02-17 18:17:20 +01:00
README.md fix: anonymize all examples and test data 2026-02-17 16:29:44 +01:00
tsconfig.json feat: knowledge-engine v0.1.0 — all Cerberus findings fixed 2026-02-17 16:10:13 +01:00

@vainplex/openclaw-knowledge-engine

A real-time knowledge extraction plugin for OpenClaw. Automatically extracts entities, facts, and relationships from conversations — building a persistent, queryable knowledge base that grows with every message.

What it does

Every message your OpenClaw agent processes flows through the Knowledge Engine:

  1. Regex Extraction (instant, zero cost) — Detects people, organizations, technologies, URLs, emails, and other entities using pattern matching
  2. LLM Enhancement (optional, batched) — Groups messages and sends them to a local LLM for deeper entity and fact extraction
  3. Fact Storage — Stores extracted knowledge as structured subject-predicate-object triples with relevance scoring
  4. Relevance Decay — Automatically decays old facts so recent knowledge surfaces first
  5. Vector Sync — Optionally syncs facts to ChromaDB for semantic search
  6. Background Maintenance — Prunes low-relevance facts, compacts storage, runs cleanup
User: "We're meeting with Alex from Acme Corp next Tuesday"
  │
  ├─ Regex → entities: [Alex (person), Acme Corp (organization)]
  └─ LLM   → facts:   [Alex — works-at — Acme Corp]
                       [Meeting — scheduled-with — Acme Corp]

Quick Start

1. Install

cd ~/.openclaw
npm install @vainplex/openclaw-knowledge-engine

2. Sync to extensions

OpenClaw loads plugins from the extensions/ directory:

mkdir -p extensions/openclaw-knowledge-engine
cp -r node_modules/@vainplex/openclaw-knowledge-engine/{dist,package.json,openclaw.plugin.json} extensions/openclaw-knowledge-engine/

3. Configure

Add to your openclaw.json:

{
  "plugins": {
    "entries": {
      "openclaw-knowledge-engine": {
        "enabled": true,
        "config": {
          "workspace": "/path/to/your/workspace",
          "extraction": {
            "regex": { "enabled": true },
            "llm": {
              "enabled": true,
              "endpoint": "http://localhost:11434/api/generate",
              "model": "mistral:7b",
              "batchSize": 10,
              "cooldownMs": 30000
            }
          }
        }
      }
    }
  }
}

4. Restart gateway

openclaw gateway restart

Configuration

Key Type Default Description
enabled boolean true Enable/disable the plugin
workspace string ~/.clawd/plugins/knowledge-engine Storage directory for knowledge files
extraction.regex.enabled boolean true High-speed regex entity extraction
extraction.llm.enabled boolean true LLM-based deep extraction
extraction.llm.model string "mistral:7b" Ollama/OpenAI-compatible model
extraction.llm.endpoint string "http://localhost:11434/api/generate" LLM API endpoint (HTTP or HTTPS)
extraction.llm.batchSize number 10 Messages per LLM batch
extraction.llm.cooldownMs number 30000 Wait time before sending batch
decay.enabled boolean true Periodic relevance decay
decay.intervalHours number 24 Hours between decay cycles
decay.rate number 0.02 Decay rate per interval (2%)
embeddings.enabled boolean false Sync facts to ChromaDB
embeddings.endpoint string "http://localhost:8000/..." ChromaDB API endpoint
embeddings.collectionName string "openclaw-facts" Vector collection name
embeddings.syncIntervalMinutes number 15 Minutes between vector syncs
storage.maxEntities number 5000 Max entities before pruning
storage.maxFacts number 10000 Max facts before pruning
storage.writeDebounceMs number 15000 Debounce delay for disk writes

Minimal config (regex only, no LLM)

{
  "openclaw-knowledge-engine": {
    "enabled": true,
    "config": {
      "extraction": {
        "llm": { "enabled": false }
      }
    }
  }
}

This gives you zero-cost entity extraction with no external dependencies.

Full config (LLM + ChromaDB)

{
  "openclaw-knowledge-engine": {
    "enabled": true,
    "config": {
      "workspace": "~/my-agent/knowledge",
      "extraction": {
        "llm": {
          "enabled": true,
          "endpoint": "http://localhost:11434/api/generate",
          "model": "mistral:7b"
        }
      },
      "embeddings": {
        "enabled": true,
        "endpoint": "http://localhost:8000/api/v1/collections/facts/add"
      },
      "decay": {
        "intervalHours": 12,
        "rate": 0.03
      }
    }
  }
}

How it works

Extraction Pipeline

Message received
      │
      ├──▶ Regex Engine (sync, <1ms)
      │     └─ Extracts: proper nouns, organizations, tech terms,
      │        URLs, emails, monetary amounts, dates
      │
      └──▶ LLM Batch Queue (async, batched)
            └─ Every N messages or after cooldown:
               └─ Sends batch to local LLM
               └─ Extracts: entities + fact triples
               └─ Stores in FactStore

Fact Lifecycle

Facts are stored as structured triples:

{
  "id": "f-abc123",
  "subject": "Alex",
  "predicate": "works-at",
  "object": "Acme Corp",
  "source": "extracted-llm",
  "relevance": 0.95,
  "createdAt": 1707123456789,
  "lastAccessedAt": 1707123456789
}
  • Relevance starts at 1.0 and decays over time
  • Accessed facts get a relevance boost (LRU-style)
  • Pruning removes facts below the relevance floor when storage limits are hit
  • Minimum floor (0.1) prevents complete decay — old facts never fully disappear

Storage

All data is persisted as JSON files in your workspace:

workspace/
├── entities.json    # Extracted entities with types and counts
└── facts.json       # Fact triples with relevance scores

Writes use atomic file operations (write to .tmp, then rename) to prevent corruption.

Architecture

index.ts                 → Plugin entry point
src/
├── types.ts             → All TypeScript interfaces
├── config.ts            → Config resolution + validation
├── patterns.ts          → Regex factories (Proxy-based, no /g state bleed)
├── entity-extractor.ts  → Regex-based entity extraction
├── llm-enhancer.ts      → Batched LLM extraction with cooldown
├── fact-store.ts        → In-memory fact store with decay + pruning
├── hooks.ts             → OpenClaw hook registration + orchestration
├── http-client.ts       → Shared HTTP/HTTPS transport
├── embeddings.ts        → ChromaDB vector sync
├── storage.ts           → Atomic JSON I/O with debounce
└── maintenance.ts       → Scheduled background tasks
  • 12 modules, each with a single responsibility
  • Zero runtime dependencies — Node.js built-ins only
  • TypeScript strict — no any in source code
  • All functions ≤40 lines

Hooks

Hook Priority Description
session_start 200 Loads fact store from disk
message_received 100 Extracts entities + queues LLM batch
message_sent 100 Same extraction on outbound messages
gateway_stop 50 Flushes writes, stops timers

Testing

npm test
# Runs 83 tests across 10 test files

Tests cover: config validation, entity extraction, fact CRUD, decay, pruning, LLM batching, HTTP client, embeddings, storage atomicity, maintenance scheduling, hook orchestration.

Part of the Vainplex Plugin Suite

# Plugin Status Description
1 @vainplex/nats-eventstore Published NATS JetStream event persistence
2 @vainplex/openclaw-cortex Published Conversation intelligence (threads, decisions, boot context)
3 @vainplex/openclaw-knowledge-engine Published Real-time knowledge extraction (this plugin)
4 @vainplex/openclaw-governance 📋 Planned Policy enforcement + guardrails
5 @vainplex/openclaw-memory-engine 📋 Planned Unified memory layer
6 @vainplex/openclaw-health-monitor 📋 Planned System health + auto-healing

License

MIT