docs: comprehensive README, LICENSE, community-ready
- Full README with: what it does, quick start, config reference, architecture overview, extraction pipeline diagram, fact lifecycle, hook table, testing instructions, plugin suite roadmap - MIT License - Repository URL fixed in package.json
This commit is contained in:
parent
45cf11bb73
commit
086f3ff0b3
3 changed files with 273 additions and 1 deletions
21
LICENSE
Normal file
21
LICENSE
Normal file
|
|
@ -0,0 +1,21 @@
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2026 Vainplex
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
251
README.md
Normal file
251
README.md
Normal file
|
|
@ -0,0 +1,251 @@
|
||||||
|
# @vainplex/openclaw-knowledge-engine
|
||||||
|
|
||||||
|
A real-time knowledge extraction plugin for [OpenClaw](https://github.com/openclaw/openclaw). Automatically extracts entities, facts, and relationships from conversations — building a persistent, queryable knowledge base that grows with every message.
|
||||||
|
|
||||||
|
## What it does
|
||||||
|
|
||||||
|
Every message your OpenClaw agent processes flows through the Knowledge Engine:
|
||||||
|
|
||||||
|
1. **Regex Extraction** (instant, zero cost) — Detects people, organizations, technologies, URLs, emails, and other entities using pattern matching
|
||||||
|
2. **LLM Enhancement** (optional, batched) — Groups messages and sends them to a local LLM for deeper entity and fact extraction
|
||||||
|
3. **Fact Storage** — Stores extracted knowledge as structured subject-predicate-object triples with relevance scoring
|
||||||
|
4. **Relevance Decay** — Automatically decays old facts so recent knowledge surfaces first
|
||||||
|
5. **Vector Sync** — Optionally syncs facts to ChromaDB for semantic search
|
||||||
|
6. **Background Maintenance** — Prunes low-relevance facts, compacts storage, runs cleanup
|
||||||
|
|
||||||
|
```
|
||||||
|
User: "We're meeting with Sebastian from Mondo Gate next Tuesday"
|
||||||
|
│
|
||||||
|
├─ Regex → entities: [Sebastian (person), Mondo Gate (organization)]
|
||||||
|
└─ LLM → facts: [Sebastian — works-at — Mondo Gate]
|
||||||
|
[Meeting — scheduled-with — Mondo Gate]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### 1. Install
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ~/.openclaw
|
||||||
|
npm install @vainplex/openclaw-knowledge-engine
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Sync to extensions
|
||||||
|
|
||||||
|
OpenClaw loads plugins from the `extensions/` directory:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkdir -p extensions/openclaw-knowledge-engine
|
||||||
|
cp -r node_modules/@vainplex/openclaw-knowledge-engine/{dist,package.json,openclaw.plugin.json} extensions/openclaw-knowledge-engine/
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Configure
|
||||||
|
|
||||||
|
Add to your `openclaw.json`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"plugins": {
|
||||||
|
"entries": {
|
||||||
|
"openclaw-knowledge-engine": {
|
||||||
|
"enabled": true,
|
||||||
|
"config": {
|
||||||
|
"workspace": "/path/to/your/workspace",
|
||||||
|
"extraction": {
|
||||||
|
"regex": { "enabled": true },
|
||||||
|
"llm": {
|
||||||
|
"enabled": true,
|
||||||
|
"endpoint": "http://localhost:11434/api/generate",
|
||||||
|
"model": "mistral:7b",
|
||||||
|
"batchSize": 10,
|
||||||
|
"cooldownMs": 30000
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Restart gateway
|
||||||
|
|
||||||
|
```bash
|
||||||
|
openclaw gateway restart
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
| Key | Type | Default | Description |
|
||||||
|
|-----|------|---------|-------------|
|
||||||
|
| `enabled` | boolean | `true` | Enable/disable the plugin |
|
||||||
|
| `workspace` | string | `~/.clawd/plugins/knowledge-engine` | Storage directory for knowledge files |
|
||||||
|
| `extraction.regex.enabled` | boolean | `true` | High-speed regex entity extraction |
|
||||||
|
| `extraction.llm.enabled` | boolean | `true` | LLM-based deep extraction |
|
||||||
|
| `extraction.llm.model` | string | `"mistral:7b"` | Ollama/OpenAI-compatible model |
|
||||||
|
| `extraction.llm.endpoint` | string | `"http://localhost:11434/api/generate"` | LLM API endpoint (HTTP or HTTPS) |
|
||||||
|
| `extraction.llm.batchSize` | number | `10` | Messages per LLM batch |
|
||||||
|
| `extraction.llm.cooldownMs` | number | `30000` | Wait time before sending batch |
|
||||||
|
| `decay.enabled` | boolean | `true` | Periodic relevance decay |
|
||||||
|
| `decay.intervalHours` | number | `24` | Hours between decay cycles |
|
||||||
|
| `decay.rate` | number | `0.02` | Decay rate per interval (2%) |
|
||||||
|
| `embeddings.enabled` | boolean | `false` | Sync facts to ChromaDB |
|
||||||
|
| `embeddings.endpoint` | string | `"http://localhost:8000/..."` | ChromaDB API endpoint |
|
||||||
|
| `embeddings.collectionName` | string | `"openclaw-facts"` | Vector collection name |
|
||||||
|
| `embeddings.syncIntervalMinutes` | number | `15` | Minutes between vector syncs |
|
||||||
|
| `storage.maxEntities` | number | `5000` | Max entities before pruning |
|
||||||
|
| `storage.maxFacts` | number | `10000` | Max facts before pruning |
|
||||||
|
| `storage.writeDebounceMs` | number | `15000` | Debounce delay for disk writes |
|
||||||
|
|
||||||
|
### Minimal config (regex only, no LLM)
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"openclaw-knowledge-engine": {
|
||||||
|
"enabled": true,
|
||||||
|
"config": {
|
||||||
|
"extraction": {
|
||||||
|
"llm": { "enabled": false }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This gives you zero-cost entity extraction with no external dependencies.
|
||||||
|
|
||||||
|
### Full config (LLM + ChromaDB)
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"openclaw-knowledge-engine": {
|
||||||
|
"enabled": true,
|
||||||
|
"config": {
|
||||||
|
"workspace": "~/my-agent/knowledge",
|
||||||
|
"extraction": {
|
||||||
|
"llm": {
|
||||||
|
"enabled": true,
|
||||||
|
"endpoint": "http://localhost:11434/api/generate",
|
||||||
|
"model": "mistral:7b"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"embeddings": {
|
||||||
|
"enabled": true,
|
||||||
|
"endpoint": "http://localhost:8000/api/v1/collections/facts/add"
|
||||||
|
},
|
||||||
|
"decay": {
|
||||||
|
"intervalHours": 12,
|
||||||
|
"rate": 0.03
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## How it works
|
||||||
|
|
||||||
|
### Extraction Pipeline
|
||||||
|
|
||||||
|
```
|
||||||
|
Message received
|
||||||
|
│
|
||||||
|
├──▶ Regex Engine (sync, <1ms)
|
||||||
|
│ └─ Extracts: proper nouns, organizations, tech terms,
|
||||||
|
│ URLs, emails, monetary amounts, dates
|
||||||
|
│
|
||||||
|
└──▶ LLM Batch Queue (async, batched)
|
||||||
|
└─ Every N messages or after cooldown:
|
||||||
|
└─ Sends batch to local LLM
|
||||||
|
└─ Extracts: entities + fact triples
|
||||||
|
└─ Stores in FactStore
|
||||||
|
```
|
||||||
|
|
||||||
|
### Fact Lifecycle
|
||||||
|
|
||||||
|
Facts are stored as structured triples:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"id": "f-abc123",
|
||||||
|
"subject": "Sebastian",
|
||||||
|
"predicate": "works-at",
|
||||||
|
"object": "Mondo Gate",
|
||||||
|
"source": "extracted-llm",
|
||||||
|
"relevance": 0.95,
|
||||||
|
"createdAt": 1707123456789,
|
||||||
|
"lastAccessedAt": 1707123456789
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- **Relevance** starts at 1.0 and decays over time
|
||||||
|
- **Accessed facts** get a relevance boost (LRU-style)
|
||||||
|
- **Pruning** removes facts below the relevance floor when storage limits are hit
|
||||||
|
- **Minimum floor** (0.1) prevents complete decay — old facts never fully disappear
|
||||||
|
|
||||||
|
### Storage
|
||||||
|
|
||||||
|
All data is persisted as JSON files in your workspace:
|
||||||
|
|
||||||
|
```
|
||||||
|
workspace/
|
||||||
|
├── entities.json # Extracted entities with types and counts
|
||||||
|
└── facts.json # Fact triples with relevance scores
|
||||||
|
```
|
||||||
|
|
||||||
|
Writes use atomic file operations (write to `.tmp`, then rename) to prevent corruption.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
index.ts → Plugin entry point
|
||||||
|
src/
|
||||||
|
├── types.ts → All TypeScript interfaces
|
||||||
|
├── config.ts → Config resolution + validation
|
||||||
|
├── patterns.ts → Regex factories (Proxy-based, no /g state bleed)
|
||||||
|
├── entity-extractor.ts → Regex-based entity extraction
|
||||||
|
├── llm-enhancer.ts → Batched LLM extraction with cooldown
|
||||||
|
├── fact-store.ts → In-memory fact store with decay + pruning
|
||||||
|
├── hooks.ts → OpenClaw hook registration + orchestration
|
||||||
|
├── http-client.ts → Shared HTTP/HTTPS transport
|
||||||
|
├── embeddings.ts → ChromaDB vector sync
|
||||||
|
├── storage.ts → Atomic JSON I/O with debounce
|
||||||
|
└── maintenance.ts → Scheduled background tasks
|
||||||
|
```
|
||||||
|
|
||||||
|
- **12 modules**, each with a single responsibility
|
||||||
|
- **Zero runtime dependencies** — Node.js built-ins only
|
||||||
|
- **TypeScript strict** — no `any` in source code
|
||||||
|
- **All functions ≤40 lines**
|
||||||
|
|
||||||
|
## Hooks
|
||||||
|
|
||||||
|
| Hook | Priority | Description |
|
||||||
|
|------|----------|-------------|
|
||||||
|
| `session_start` | 200 | Loads fact store from disk |
|
||||||
|
| `message_received` | 100 | Extracts entities + queues LLM batch |
|
||||||
|
| `message_sent` | 100 | Same extraction on outbound messages |
|
||||||
|
| `gateway_stop` | 50 | Flushes writes, stops timers |
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm test
|
||||||
|
# Runs 83 tests across 10 test files
|
||||||
|
```
|
||||||
|
|
||||||
|
Tests cover: config validation, entity extraction, fact CRUD, decay, pruning, LLM batching, HTTP client, embeddings, storage atomicity, maintenance scheduling, hook orchestration.
|
||||||
|
|
||||||
|
## Part of the Darkplex Plugin Suite
|
||||||
|
|
||||||
|
| # | Plugin | Status | Description |
|
||||||
|
|---|--------|--------|-------------|
|
||||||
|
| 1 | [@vainplex/nats-eventstore](https://github.com/alberthild/openclaw-nats-eventstore) | ✅ Published | NATS JetStream event persistence |
|
||||||
|
| 2 | [@vainplex/openclaw-cortex](https://github.com/alberthild/openclaw-cortex) | ✅ Published | Conversation intelligence (threads, decisions, boot context) |
|
||||||
|
| 3 | **@vainplex/openclaw-knowledge-engine** | ✅ Published | Real-time knowledge extraction (this plugin) |
|
||||||
|
| 4 | @vainplex/openclaw-governance | 📋 Planned | Policy enforcement + guardrails |
|
||||||
|
| 5 | @vainplex/openclaw-memory-engine | 📋 Planned | Unified memory layer |
|
||||||
|
| 6 | @vainplex/openclaw-health-monitor | 📋 Planned | System health + auto-healing |
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT
|
||||||
|
|
@ -21,7 +21,7 @@
|
||||||
"license": "MIT",
|
"license": "MIT",
|
||||||
"repository": {
|
"repository": {
|
||||||
"type": "git",
|
"type": "git",
|
||||||
"url": "https://github.com/your-repo/openclaw-knowledge-engine.git"
|
"url": "https://github.com/alberthild/openclaw-knowledge-engine.git"
|
||||||
},
|
},
|
||||||
"openclaw": {
|
"openclaw": {
|
||||||
"id": "@vainplex/openclaw-knowledge-engine"
|
"id": "@vainplex/openclaw-knowledge-engine"
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue