openclaw-vainplex/docs
Josh Palmer 7a6c40872d
Agents: add system prompt safety guardrails (#5445)
* 🤖 agents: add system prompt safety guardrails

What:
- add safety guardrails to system prompt
- update system prompt docs
- update prompt tests

Why:
- discourage power-seeking or self-modification behavior
- clarify safety/oversight priority when conflicts arise

Tests:
- pnpm lint (pass)
- pnpm build (fails: DefaultResourceLoader missing in pi-coding-agent)
- pnpm test (not run; build failed)

* 🤖 agents: tighten safety wording for prompt guardrails

What:
- scope safety wording to system prompts/safety/tool policy changes
- document Safety inclusion in minimal prompt mode
- update safety prompt tests

Why:
- avoid blocking normal code changes or PR workflows
- keep prompt mode docs consistent with implementation

Tests:
- pnpm lint (pass)
- pnpm build (fails: DefaultResourceLoader missing in pi-coding-agent)
- pnpm test (not run; build failed)

* 🤖 docs: note safety guardrails are soft

What:
- document system prompt safety guardrails as advisory
- add security note on prompt guardrails vs hard controls

Why:
- clarify threat model and operator expectations
- avoid implying prompt text is an enforcement layer

Tests:
- pnpm lint (pass)
- pnpm build (fails: DefaultResourceLoader missing in pi-coding-agent)
- pnpm test (not run; build failed)
2026-01-31 15:50:15 +01:00
..
_layouts refactor: rename to openclaw 2026-01-30 03:16:21 +01:00
assets Add files via upload 2026-01-29 23:37:32 -05:00
automation Docs: add actionable cron quick start (#5446) 2026-01-31 15:21:31 +01:00
channels chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
cli chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
concepts Agents: add system prompt safety guardrails (#5445) 2026-01-31 15:50:15 +01:00
debug chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
diagnostics chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
experiments chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
gateway Agents: add system prompt safety guardrails (#5445) 2026-01-31 15:50:15 +01:00
help chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
hooks refactor: rename to openclaw 2026-01-30 03:16:21 +01:00
images docs: add group flow diagram 2026-01-10 20:05:22 +01:00
install chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
nodes chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
platforms chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
plugins chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
providers chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
refactor chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
reference chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
security chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
start chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
tools chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
web chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
_config.yml refactor: rename to openclaw 2026-01-30 03:16:21 +01:00
bedrock.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
brave-search.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
broadcast-groups.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
CNAME refactor: rename to openclaw 2026-01-30 03:16:21 +01:00
date-time.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
debugging.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
docs.json chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
environment.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
hooks.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
index.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
logging.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
multi-agent-sandbox-tools.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
network.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
northflank.mdx chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
perplexity.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
pi-dev.md docs: add pi and pi-dev documentation 2026-01-31 04:20:12 +01:00
pi.md Agents: add system prompt safety guardrails (#5445) 2026-01-31 15:50:15 +01:00
plugin.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
prose.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
railway.mdx chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
render.mdx chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
scripts.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
testing.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
token-use.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
tts.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
tui.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
vps.md chore: Run pnpm format:fix. 2026-01-31 21:13:13 +09:00
whatsapp-openclaw.jpg refactor: rename to openclaw 2026-01-30 03:16:21 +01:00