@guarded_action executes. Each guardrail is configured per-policy, raises a structured PolicyViolationError (or replays a cached value, in the case of idempotency), and emits a typed intervention so the console can render a dedicated detail panel.
| Guardrail | Module | Reason code | Intervention type |
|---|---|---|---|
| PII detection | agent_sentinel.guardrails.pii | PII_DETECTED | PII_BLOCKED |
| Content moderation | agent_sentinel.guardrails.moderation | CONTENT_MODERATED | CONTENT_BLOCKED |
| Loop protection | agent_sentinel.guardrails.loop_detector | LOOP_DETECTED | LOOP_DETECTED |
| Idempotency | agent_sentinel.guardrails.idempotency | n/a (transparent replay) | IDEMPOTENT_REPLAY |
How guardrails are configured
Guardrail rules live onPolicyConfig (not on PolicyEngine.configure(), which is the convenience wrapper for budgets / rate-limits / evidence). You attach rules per-action via the pii_rules / moderation_rules / loop_rules dicts, or flip the matching *_default_enabled flag to apply a default rule to every action.
PII detection
Walks@guarded_action kwargs recursively and blocks the call if any string field matches a PII pattern. Detected categories: email, us_ssn, credit_card (Luhn-validated), phone_us, api_key_like, aws_access_key, private_key_block.
PolicyViolationError with details.reason_code == "PII_DETECTED" and a matches array containing the field path, category, and a redacted preview (first/last 2 chars). Nothing in the matched text is logged in the clear.
Content moderation
Pluggable moderator that scans kwarg strings. The defaultKeywordModerator is fully offline; GeminiModerator uses gemini-2.5-flash to catch intent-level evasions (prompt injection, policy evasion, data exfiltration) that keyword matching misses.
GeminiModerator reads GEMINI_API_KEY (or GOOGLE_API_KEY) from the environment, retries up to three times on transient 429/5xx, caps input at 2,000 chars, and degrades gracefully (returns “not flagged” with a logged warning) if the key or google-genai package is missing.
Strictness modes:
strict— every flagged result blocksbalanced— only flagged categories that appear inblock_categoriesblockpermissive— explicit category matches only
The platform also runs Gemini-powered intervention enrichment: even if the SDK uses
KeywordModerator locally, blocks reaching the platform get a plain-English reason, agent_intent, and suggested_rewrite rewritten by Gemini. See LLM integrations.Loop protection
Detects tight loops where the agent calls the same action with semantically identical arguments N times within a sliding window. Distinct from rate limits, which count calls regardless of arguments.arg_exclude) with SHA-256 and keeps a sliding deque per (action, arg_hash). When the deque exceeds threshold entries inside window_seconds, the SDK raises PolicyViolationError carrying a LoopDetection payload:
break_out_hint is fed back into the LLM via self-repair feedback so the model can correct its plan rather than hammering a failed action.
Idempotency
Caches the result of an action keyed by(run_id, idempotency_key). A second call with the same key inside the TTL transparently returns the cached value and is recorded as an IDEMPOTENT_REPLAY intervention — the function body never runs twice.
idempotency_key accepts either a static string or a callable invoked with (*args, **kwargs). If it returns None or empty the call falls through and executes normally.
The cache is in-process and thread-safe. For distributed deduplication you currently still need an upstream store (Redis, the platform’s commit-repeat detector); a shared backing store is on the roadmap.
Putting it together
See also
- Instrumentation —
@guarded_actiondecorator - Policies — budgets, rate limits, allow/deny lists
- LLM integrations — Gemini-powered enrichment
- Console → Interventions — guardrail detail panels
