docs: comprehensive README, LICENSE, community-ready

- Full README with: what it does, quick start, config reference, architecture overview, extraction pipeline diagram, fact lifecycle, hook table, testing instructions, plugin suite roadmap - MIT License - Repository URL fixed in package.json
2026-02-17 16:27:29 +01:00 · 2026-02-17 16:27:29 +01:00 · 086f3ff0b3
commit 086f3ff0b3
parent 45cf11bb73
3 changed files with 273 additions and 1 deletions
--- a/21
+++ b/21
@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2026 Vainplex
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/README.md
+++ b/README.md
@ -0,0 +1,251 @@
+# @vainplex/openclaw-knowledge-engine
+
+A real-time knowledge extraction plugin for [OpenClaw](https://github.com/openclaw/openclaw). Automatically extracts entities, facts, and relationships from conversations — building a persistent, queryable knowledge base that grows with every message.
+
+## What it does
+
+Every message your OpenClaw agent processes flows through the Knowledge Engine:
+
+1. **Regex Extraction** (instant, zero cost) — Detects people, organizations, technologies, URLs, emails, and other entities using pattern matching
+2. **LLM Enhancement** (optional, batched) — Groups messages and sends them to a local LLM for deeper entity and fact extraction
+3. **Fact Storage** — Stores extracted knowledge as structured subject-predicate-object triples with relevance scoring
+4. **Relevance Decay** — Automatically decays old facts so recent knowledge surfaces first
+5. **Vector Sync** — Optionally syncs facts to ChromaDB for semantic search
+6. **Background Maintenance** — Prunes low-relevance facts, compacts storage, runs cleanup
+
+```
+User: "We're meeting with Sebastian from Mondo Gate next Tuesday"
+  │
+  ├─ Regex → entities: [Sebastian (person), Mondo Gate (organization)]
+  └─ LLM   → facts:   [Sebastian — works-at — Mondo Gate]
+                       [Meeting — scheduled-with — Mondo Gate]
+```
+
+## Quick Start
+
+### 1. Install
+
+```bash
+cd ~/.openclaw
+npm install @vainplex/openclaw-knowledge-engine
+```
+
+### 2. Sync to extensions
+
+OpenClaw loads plugins from the `extensions/` directory:
+
+```bash
+mkdir -p extensions/openclaw-knowledge-engine
+cp -r node_modules/@vainplex/openclaw-knowledge-engine/{dist,package.json,openclaw.plugin.json} extensions/openclaw-knowledge-engine/
+```
+
+### 3. Configure
+
+Add to your `openclaw.json`:
+
+```json
+{
+  "plugins": {
+    "entries": {
+      "openclaw-knowledge-engine": {
+        "enabled": true,
+        "config": {
+          "workspace": "/path/to/your/workspace",
+          "extraction": {
+            "regex": { "enabled": true },
+            "llm": {
+              "enabled": true,
+              "endpoint": "http://localhost:11434/api/generate",
+              "model": "mistral:7b",
+              "batchSize": 10,
+              "cooldownMs": 30000
+            }
+          }
+        }
+      }
+    }
+  }
+}
+```
+
+### 4. Restart gateway
+
+```bash
+openclaw gateway restart
+```
+
+## Configuration
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `enabled` | boolean | `true` | Enable/disable the plugin |
+| `workspace` | string | `~/.clawd/plugins/knowledge-engine` | Storage directory for knowledge files |
+| `extraction.regex.enabled` | boolean | `true` | High-speed regex entity extraction |
+| `extraction.llm.enabled` | boolean | `true` | LLM-based deep extraction |
+| `extraction.llm.model` | string | `"mistral:7b"` | Ollama/OpenAI-compatible model |
+| `extraction.llm.endpoint` | string | `"http://localhost:11434/api/generate"` | LLM API endpoint (HTTP or HTTPS) |
+| `extraction.llm.batchSize` | number | `10` | Messages per LLM batch |
+| `extraction.llm.cooldownMs` | number | `30000` | Wait time before sending batch |
+| `decay.enabled` | boolean | `true` | Periodic relevance decay |
+| `decay.intervalHours` | number | `24` | Hours between decay cycles |
+| `decay.rate` | number | `0.02` | Decay rate per interval (2%) |
+| `embeddings.enabled` | boolean | `false` | Sync facts to ChromaDB |
+| `embeddings.endpoint` | string | `"http://localhost:8000/..."` | ChromaDB API endpoint |
+| `embeddings.collectionName` | string | `"openclaw-facts"` | Vector collection name |
+| `embeddings.syncIntervalMinutes` | number | `15` | Minutes between vector syncs |
+| `storage.maxEntities` | number | `5000` | Max entities before pruning |
+| `storage.maxFacts` | number | `10000` | Max facts before pruning |
+| `storage.writeDebounceMs` | number | `15000` | Debounce delay for disk writes |
+
+### Minimal config (regex only, no LLM)
+
+```json
+{
+  "openclaw-knowledge-engine": {
+    "enabled": true,
+    "config": {
+      "extraction": {
+        "llm": { "enabled": false }
+      }
+    }
+  }
+}
+```
+
+This gives you zero-cost entity extraction with no external dependencies.
+
+### Full config (LLM + ChromaDB)
+
+```json
+{
+  "openclaw-knowledge-engine": {
+    "enabled": true,
+    "config": {
+      "workspace": "~/my-agent/knowledge",
+      "extraction": {
+        "llm": {
+          "enabled": true,
+          "endpoint": "http://localhost:11434/api/generate",
+          "model": "mistral:7b"
+        }
+      },
+      "embeddings": {
+        "enabled": true,
+        "endpoint": "http://localhost:8000/api/v1/collections/facts/add"
+      },
+      "decay": {
+        "intervalHours": 12,
+        "rate": 0.03
+      }
+    }
+  }
+}
+```
+
+## How it works
+
+### Extraction Pipeline
+
+```
+Message received
+      │
+      ├──▶ Regex Engine (sync, <1ms)
+      │     └─ Extracts: proper nouns, organizations, tech terms,
+      │        URLs, emails, monetary amounts, dates
+      │
+      └──▶ LLM Batch Queue (async, batched)
+            └─ Every N messages or after cooldown:
+               └─ Sends batch to local LLM
+               └─ Extracts: entities + fact triples
+               └─ Stores in FactStore
+```
+
+### Fact Lifecycle
+
+Facts are stored as structured triples:
+
+```json
+{
+  "id": "f-abc123",
+  "subject": "Sebastian",
+  "predicate": "works-at",
+  "object": "Mondo Gate",
+  "source": "extracted-llm",
+  "relevance": 0.95,
+  "createdAt": 1707123456789,
+  "lastAccessedAt": 1707123456789
+}
+```
+
+- **Relevance** starts at 1.0 and decays over time
+- **Accessed facts** get a relevance boost (LRU-style)
+- **Pruning** removes facts below the relevance floor when storage limits are hit
+- **Minimum floor** (0.1) prevents complete decay — old facts never fully disappear
+
+### Storage
+
+All data is persisted as JSON files in your workspace:
+
+```
+workspace/
+├── entities.json    # Extracted entities with types and counts
+└── facts.json       # Fact triples with relevance scores
+```
+
+Writes use atomic file operations (write to `.tmp`, then rename) to prevent corruption.
+
+## Architecture
+
+```
+index.ts                 → Plugin entry point
+src/
+├── types.ts             → All TypeScript interfaces
+├── config.ts            → Config resolution + validation
+├── patterns.ts          → Regex factories (Proxy-based, no /g state bleed)
+├── entity-extractor.ts  → Regex-based entity extraction
+├── llm-enhancer.ts      → Batched LLM extraction with cooldown
+├── fact-store.ts        → In-memory fact store with decay + pruning
+├── hooks.ts             → OpenClaw hook registration + orchestration
+├── http-client.ts       → Shared HTTP/HTTPS transport
+├── embeddings.ts        → ChromaDB vector sync
+├── storage.ts           → Atomic JSON I/O with debounce
+└── maintenance.ts       → Scheduled background tasks
+```
+
+- **12 modules**, each with a single responsibility
+- **Zero runtime dependencies** — Node.js built-ins only
+- **TypeScript strict** — no `any` in source code
+- **All functions ≤40 lines**
+
+## Hooks
+
+| Hook | Priority | Description |
+|------|----------|-------------|
+| `session_start` | 200 | Loads fact store from disk |
+| `message_received` | 100 | Extracts entities + queues LLM batch |
+| `message_sent` | 100 | Same extraction on outbound messages |
+| `gateway_stop` | 50 | Flushes writes, stops timers |
+
+## Testing
+
+```bash
+npm test
+# Runs 83 tests across 10 test files
+```
+
+Tests cover: config validation, entity extraction, fact CRUD, decay, pruning, LLM batching, HTTP client, embeddings, storage atomicity, maintenance scheduling, hook orchestration.
+
+## Part of the Darkplex Plugin Suite
+
+| # | Plugin | Status | Description |
+|---|--------|--------|-------------|
+| 1 | [@vainplex/nats-eventstore](https://github.com/alberthild/openclaw-nats-eventstore) | ✅ Published | NATS JetStream event persistence |
+| 2 | [@vainplex/openclaw-cortex](https://github.com/alberthild/openclaw-cortex) | ✅ Published | Conversation intelligence (threads, decisions, boot context) |
+| 3 | **@vainplex/openclaw-knowledge-engine** | ✅ Published | Real-time knowledge extraction (this plugin) |
+| 4 | @vainplex/openclaw-governance | 📋 Planned | Policy enforcement + guardrails |
+| 5 | @vainplex/openclaw-memory-engine | 📋 Planned | Unified memory layer |
+| 6 | @vainplex/openclaw-health-monitor | 📋 Planned | System health + auto-healing |
+
+## License
+
+MIT
--- a/package.json
+++ b/package.json
@ -21,7 +21,7 @@
  "license": "MIT",
  "repository": {
    "type": "git",
-    "url": "https://github.com/your-repo/openclaw-knowledge-engine.git"
+    "url": "https://github.com/alberthild/openclaw-knowledge-engine.git"
  },
  "openclaw": {
    "id": "@vainplex/openclaw-knowledge-engine"