Sentinel ships enabled with sensible defaults. Most teams can leave it running as-is and it will quietly block prompt injection attempts, redact PII before it leaves the platform, and throttle misbehaving agents. This guide covers how to move beyond defaults: tighten PII rules for regulated industries, adjust injection sensitivity, and set per-agent rate limits that match your usage patterns.
The 6 Layers
-
1Prompt injection detectionClassifies inbound messages for jailbreak attempts, instruction override patterns, and adversarial prompts before they reach any agent. Runs on every message on every channel.
-
2PII filteringScans inbound messages, tool call outputs, and outbound channel posts for emails, phone numbers, SSNs, API keys, and credit card numbers. Configurable per data type: redact, block, or log_only.
-
3Toxicity guardrailsContent policy enforcement on agent outputs. Threshold is configurable — from permissive (catch only explicit content) to strict (flag borderline language for review).
-
4Rate limitingPer-agent request and token limits. Prevents runaway agents from burning budget or hammering external APIs. Configurable per agent with burst allowance.
-
5Seccomp sandboxingBlocks dangerous syscalls at the OS level in the agent runtime container. Non-configurable by design — this layer is always active.
-
6Audit loggingImmutable, append-only log of all configuration changes. Tamper detection via hash chaining. Retention and export are configurable.
Where Sentinel Lives
Sentinel is configured in Dashboard → Security. Settings are written to ~/.openclaw/sentinel.json and take effect immediately — no restart required. You can also edit the file directly and reload with:
docker compose exec openclaw openclaw sentinel reload
Layer 1: Prompt Injection Detection
The injection detector runs a classifier on every inbound message. The classifier looks for patterns that indicate an attempt to override the agent's instructions: phrases like "ignore previous instructions", role-play escalation patterns, delimiter attacks, and indirect injection via tool outputs.
Three settings control this layer:
// sentinel.json
{
"injection": {
"enabled": true,
"sensitivity": "medium", // "low" | "medium" | "high"
"action": "block", // "block" | "flag" | "log_only"
"scan_tool_outputs": true // scan tool call results too
}
}
Sensitivity levels: low catches only obvious jailbreak attempts. medium (default) catches most known patterns with a low false-positive rate. high is appropriate for agents that process untrusted external content — it will occasionally flag legitimate messages that happen to use override-like phrasing.
scan_tool_outputs: Indirect prompt injection — where a malicious payload is embedded in a web page, email, or document that an agent retrieves — is often harder to catch than direct injection. Set this to true whenever agents read from external sources.
If agents browse the web or read customer emails, set sensitivity: "high" and scan_tool_outputs: true. The most damaging injection attacks come through content the agent is instructed to read, not direct user messages.
Layer 2: PII Filtering
PII filtering is the most commonly customized layer. The defaults (redact everything) are safe for most deployments. The table below shows each PII type, its default behavior, and when you'd change it.
| PII Type | Default Mode | Notes |
|---|---|---|
| Email addresses | redact | Replaced with [EMAIL] in logs and outputs |
| Phone numbers | redact | Replaced with [PHONE] |
| Social Security Numbers | force-block | Cannot be changed — always blocks the message |
| Credit card numbers | force-block | Cannot be changed — always blocks the message |
| API keys / tokens | force-block | Regex-matched against common key formats (sk-*, ghp_*, etc.) |
| IP addresses | log_only | Logged but not redacted — change to redact for GDPR |
| Physical addresses | log_only | Lower precision — expect some false positives |
PII Filter Modes
Replaces the detected value with a typed placeholder. The message is delivered; the sensitive value is not. Audit log retains the original for forensics (encrypted).
Rejects the entire message. The agent never sees it. Sender receives an error. Use for types where any presence of the data is a policy violation.
Passes the message through unchanged. Logs the detection event. Use for monitoring and tuning before you enforce stricter rules.
Example: HIPAA-Aligned Configuration
For healthcare deployments, you want to catch the full set of HIPAA Protected Health Information (PHI) identifiers. Here's a configuration tuned for that:
{
"pii": {
"enabled": true,
"scan_inbound": true,
"scan_tool_outputs": true,
"scan_outbound": true,
"types": {
"email": "redact",
"phone": "redact",
"ssn": "block", // force-blocked regardless
"credit_card": "block", // force-blocked regardless
"api_key": "block", // force-blocked regardless
"ip_address": "redact",
"address": "redact",
"dob": "redact", // date of birth — HIPAA identifier
"mrn": "block" // medical record numbers
}
}
}
Set scan_outbound: true to run PII detection on everything agents post to Slack, Discord, or email. This catches cases where an agent pulls PII from a database and inadvertently includes it in a channel message.
Layer 3: Toxicity Guardrails
Toxicity filtering runs on agent outputs before they're posted to any channel. The threshold controls how aggressively borderline content is flagged:
{
"toxicity": {
"enabled": true,
"threshold": 0.7, // 0.0 (strict) to 1.0 (permissive)
"action": "block", // "block" | "flag" | "log_only"
"categories": [
"hate", "harassment", "violence", "sexual"
]
}
}
For most business deployments, the default threshold of 0.7 with block action is appropriate. If you're building a support agent that processes customer complaints (which may contain strong language), set the threshold to 0.85 or use flag instead of block so a human can review borderline messages.
Layer 4: Rate Limiting
Rate limits apply per agent, per time window. Two dimensions: requests per minute and tokens per hour. The token limit is the more important control for cost management.
{
"rate_limits": {
"default": {
"requests_per_min": 20,
"tokens_per_hour": 50000,
"burst": 5 // extra requests allowed in burst
},
"per_agent": {
"support-agent": {
"requests_per_min": 60, // high-volume support
"tokens_per_hour": 200000
},
"research-agent": {
"requests_per_min": 5, // slow background research
"tokens_per_hour": 30000
}
}
}
}
The burst value lets an agent temporarily exceed its per-minute limit by up to that many requests. This prevents false throttling for agents that work in short bursts (like a webhook receiver that processes a batch of incoming events).
Sensitive Route Limits
The API has a separate, lower limit for sensitive routes — credential management, pack installation, and API key operations. These are fixed at 20 req/min and cannot be raised via config. This is intentional: if a credential endpoint is being hit 20 times per minute, something is wrong.
Layer 6: Audit Log Configuration
The audit log records all configuration changes, member actions, and security events. By default, logs are written to ~/.openclaw/audit.log in append-only mode.
{
"audit": {
"enabled": true,
"retention_days": 365, // SOC 2 requires 12 months
"export_webhook": "https://your-siem.internal/ingest",
"include_agent_runs": true // log every agent invocation
}
}
Set export_webhook to push audit events to your SIEM in real time. The webhook receives newline-delimited JSON, one event per line, compatible with Splunk, Datadog, and Elastic.
Viewing the Security Dashboard
Dashboard → Security shows the live state of all Sentinel layers: injection block rate, PII detections by type, toxicity flags, and rate limit hits per agent over the last 24 hours. Use this to tune thresholds — if injection block rate is above 5% and most blocks are legitimate traffic, lower the sensitivity. If it's near 0%, your agents probably aren't exposed to adversarial input and the overhead is minimal.
ClawHQ ships with Sentinel on every plan
No security add-on required. Prompt injection detection, PII filtering, toxicity guardrails, and audit logging — active on your first deploy.
Deploy ClawHQ → Read the docs