openclaw-vainplex/src
Josh Palmer 7a6c40872d
Agents: add system prompt safety guardrails (#5445)
* 🤖 agents: add system prompt safety guardrails

What:
- add safety guardrails to system prompt
- update system prompt docs
- update prompt tests

Why:
- discourage power-seeking or self-modification behavior
- clarify safety/oversight priority when conflicts arise

Tests:
- pnpm lint (pass)
- pnpm build (fails: DefaultResourceLoader missing in pi-coding-agent)
- pnpm test (not run; build failed)

* 🤖 agents: tighten safety wording for prompt guardrails

What:
- scope safety wording to system prompts/safety/tool policy changes
- document Safety inclusion in minimal prompt mode
- update safety prompt tests

Why:
- avoid blocking normal code changes or PR workflows
- keep prompt mode docs consistent with implementation

Tests:
- pnpm lint (pass)
- pnpm build (fails: DefaultResourceLoader missing in pi-coding-agent)
- pnpm test (not run; build failed)

* 🤖 docs: note safety guardrails are soft

What:
- document system prompt safety guardrails as advisory
- add security note on prompt guardrails vs hard controls

Why:
- clarify threat model and operator expectations
- avoid implying prompt text is an enforcement layer

Tests:
- pnpm lint (pass)
- pnpm build (fails: DefaultResourceLoader missing in pi-coding-agent)
- pnpm test (not run; build failed)
2026-01-31 15:50:15 +01:00
..
acp chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
agents Agents: add system prompt safety guardrails (#5445) 2026-01-31 15:50:15 +01:00
auto-reply fix: lint cleanups 2026-01-31 07:59:01 +00:00
browser chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
canvas-host chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
channels chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
cli revert: Switch back to tsc for compiling. 2026-01-31 18:31:49 +09:00
commands feat: add MiniMax OAuth plugin (#4521) (thanks @Maosghoul) 2026-01-31 12:42:45 +01:00
compat refactor: rename to openclaw 2026-01-30 03:16:21 +01:00
config feat: add MiniMax OAuth plugin (#4521) (thanks @Maosghoul) 2026-01-31 12:42:45 +01:00
cron chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
daemon revert: Switch back to tsc for compiling. 2026-01-31 18:31:49 +09:00
discord chore: Emit TypeScript declaration files so that we can type-check the extensions folder soon. 2026-01-31 21:57:21 +09:00
docs bugfix:The Mintlify navbar (logo + search bar with ⌘K) scrolls away w… (#2445) 2026-01-26 17:39:10 -08:00
gateway chore: Emit TypeScript declaration files so that we can type-check the extensions folder soon. 2026-01-31 21:57:21 +09:00
hooks chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
imessage chore: Oops, "long" is actually used + fix TypeScript error. 2026-01-31 17:12:28 +09:00
infra revert: Switch back to tsc for compiling. 2026-01-31 18:31:49 +09:00
line chore: Emit TypeScript declaration files so that we can type-check the extensions folder soon. 2026-01-31 21:57:21 +09:00
link-understanding chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
logging chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
macos chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
markdown chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
media chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
media-understanding chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
memory chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
node-host chore: Fix TypeScript errors 1/n. 2026-01-31 16:38:03 +09:00
pairing chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
plugin-sdk refactor: rename to openclaw 2026-01-30 03:16:21 +01:00
plugins revert: Switch back to tsc for compiling. 2026-01-31 18:31:49 +09:00
process chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
providers chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
routing chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
scripts refactor: rename to openclaw 2026-01-30 03:16:21 +01:00
security chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
sessions chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
shared/text chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
signal chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
slack chore: Emit TypeScript declaration files so that we can type-check the extensions folder soon. 2026-01-31 21:57:21 +09:00
telegram chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
terminal chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
test-helpers refactor: rename to openclaw 2026-01-30 03:16:21 +01:00
test-utils chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
tts chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
tui chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
types chore: Enable more lint rules, disable some that trigger a lot. Will clean up later. 2026-01-31 16:04:04 +09:00
utils chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
web chore: Emit TypeScript declaration files so that we can type-check the extensions folder soon. 2026-01-31 21:57:21 +09:00
whatsapp chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
wizard fix: restore tsc build and plugin install tests 2026-01-31 07:54:15 +00:00
channel-web.barrel.test.ts refactor!: rename chat providers to channels 2026-01-13 08:40:39 +00:00
channel-web.ts refactor!: rename chat providers to channels 2026-01-13 08:40:39 +00:00
docker-setup.test.ts refactor: rename to openclaw 2026-01-30 03:16:21 +01:00
entry.ts chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
git-hooks.test.ts chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
globals.test.ts chore: format to 2-space and bump changelog 2025-11-26 00:53:53 +01:00
globals.ts chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
index.test.ts chore: migrate to oxlint and oxfmt 2026-01-14 15:02:19 +00:00
index.ts refactor: rename to openclaw 2026-01-30 03:16:21 +01:00
logger.test.ts refactor: rename to openclaw 2026-01-30 03:16:21 +01:00
logger.ts chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
logging.ts fix: unblock bundled plugin load 2026-01-18 19:34:21 +00:00
polls.test.ts chore: migrate to oxlint and oxfmt 2026-01-14 15:02:19 +00:00
polls.ts chore: migrate to oxlint and oxfmt 2026-01-14 15:02:19 +00:00
postinstall-patcher.test.ts refactor: rename to openclaw 2026-01-30 03:16:21 +01:00
runtime.ts perf: speed up memory batch polling 2026-01-18 03:55:14 +00:00
utils.test.ts chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
utils.ts chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
version.ts refactor: rename to openclaw 2026-01-30 03:16:21 +01:00