Guardrails - Agent Sentinel

Phase 7 added four built-in runtime guardrails that run inside the policy engine before a @guarded_action executes. Each guardrail is configured per-policy, raises a structured PolicyViolationError (or replays a cached value, in the case of idempotency), and emits a typed intervention so the console can render a dedicated detail panel.

Guardrail	Module	Reason code	Intervention type
PII detection	`agent_sentinel.guardrails.pii`	`PII_DETECTED`	`PII_BLOCKED`
Content moderation	`agent_sentinel.guardrails.moderation`	`CONTENT_MODERATED`	`CONTENT_BLOCKED`
Loop protection	`agent_sentinel.guardrails.loop_detector`	`LOOP_DETECTED`	`LOOP_DETECTED`
Idempotency	`agent_sentinel.guardrails.idempotency`	n/a (transparent replay)	`IDEMPOTENT_REPLAY`

All four are stdlib-only by default. Moderation can opt into a Gemini-backed semantic moderator; everything else has zero new dependencies.

How guardrails are configured

Guardrail rules live on PolicyConfig (not on PolicyEngine.configure(), which is the convenience wrapper for budgets / rate-limits / evidence). You attach rules per-action via the pii_rules / moderation_rules / loop_rules dicts, or flip the matching *_default_enabled flag to apply a default rule to every action.

from agent_sentinel import PolicyConfig, PolicyEngine
from agent_sentinel.guardrails import PIIRule, ModerationRule, LoopRule

PolicyEngine._config = PolicyConfig(
    pii_rules={"send_email": PIIRule()},      # per-action
    pii_default_enabled=True,                 # OR scan every action with default rule
    moderation_rules={"post_message": ModerationRule()},
    loop_rules={"retry_payment": LoopRule(threshold=3, window_seconds=15.0)},
)
PolicyEngine._initialized = True

When the platform’s policy sync is enabled, the same per-action rules can be authored in the platform UI and pushed down to the SDK — no local configuration needed.

PII detection

Walks @guarded_action kwargs recursively and blocks the call if any string field matches a PII pattern. Detected categories: email, us_ssn, credit_card (Luhn-validated), phone_us, api_key_like, aws_access_key, private_key_block.

from agent_sentinel.guardrails import PIIRule

rule = PIIRule(
    categories=("email", "us_ssn", "credit_card"),
    extra_patterns={"customer_id": r"^CUST-\d{8}$"},
    allow_categories=("phone_us",),
    redact_preview=True,
)
# Attach via PolicyConfig (see "How guardrails are configured" above)
PolicyEngine._config.pii_rules["send_email"] = rule

When a match is found the SDK raises PolicyViolationError with details.reason_code == "PII_DETECTED" and a matches array containing the field path, category, and a redacted preview (first/last 2 chars). Nothing in the matched text is logged in the clear.

Pattern checks are conservative — credit-card candidates are Luhn-validated, SSN ranges exclude reserved blocks, and api_key_like requires a known prefix or 40+ char base64. Add extra_patterns for site-specific identifiers.

Content moderation

Pluggable moderator that scans kwarg strings. The default KeywordModerator is fully offline; GeminiModerator uses gemini-2.5-flash to catch intent-level evasions (prompt injection, policy evasion, data exfiltration) that keyword matching misses.

from agent_sentinel.guardrails import ModerationRule, GeminiModerator

rule = ModerationRule(
    strictness="balanced",
    block_categories=(
        "violence", "self_harm", "sexual_minors", "hate_severe",
        "prompt_injection", "data_exfiltration",
    ),
    moderator=GeminiModerator(model="gemini-2.5-flash"),
)
PolicyEngine._config.moderation_rules["post_message"] = rule
# OR enable a default ModerationRule for every action:
PolicyEngine._config.moderation_default_enabled = True

GeminiModerator reads GEMINI_API_KEY (or GOOGLE_API_KEY) from the environment, retries up to three times on transient 429/5xx, caps input at 2,000 chars, and degrades gracefully (returns “not flagged” with a logged warning) if the key or google-genai package is missing. Strictness modes:

strict — every flagged result blocks
balanced — only flagged categories that appear in block_categories block
permissive — explicit category matches only

The platform also runs Gemini-powered intervention enrichment: even if the SDK uses KeywordModerator locally, blocks reaching the platform get a plain-English reason, agent_intent, and suggested_rewrite rewritten by Gemini. See LLM integrations.

Loop protection

Detects tight loops where the agent calls the same action with semantically identical arguments N times within a sliding window. Distinct from rate limits, which count calls regardless of arguments.

from agent_sentinel.guardrails import LoopRule

rule = LoopRule(
    threshold=5,
    window_seconds=10.0,
    arg_exclude=("request_id", "timestamp"),
)
PolicyEngine._config.loop_rules["retry_payment"] = rule
# OR enable a default LoopRule for every action:
PolicyEngine._config.loop_default_enabled = True

The detector hashes each kwargs dict (excluding the keys in arg_exclude) with SHA-256 and keeps a sliding deque per (action, arg_hash). When the deque exceeds threshold entries inside window_seconds, the SDK raises PolicyViolationError carrying a LoopDetection payload:

{
    "action": "retry_payment",
    "arg_hash": "9f3c…",
    "count": 5,
    "window_seconds": 10.0,
    "first_seen_at": 1733212980.123,
    "last_seen_at": 1733212983.876,
    "recent_args": [{...}, {...}, ...],
    "break_out_hint": "Action 'retry_payment' was called 5 times with identical arguments in 10.0s. Break the loop: vary the arguments, escalate to a human, or stop retrying.",
}

The break_out_hint is fed back into the LLM via self-repair feedback so the model can correct its plan rather than hammering a failed action.

Loop detection is in-process only. Distributed loops across worker processes are out of scope for this primitive — use rate limits backed by a shared store for that case.

Idempotency

Caches the result of an action keyed by (run_id, idempotency_key). A second call with the same key inside the TTL transparently returns the cached value and is recorded as an IDEMPOTENT_REPLAY intervention — the function body never runs twice.

from agent_sentinel import guarded_action

@guarded_action(
    name="charge_card",
    idempotency_key=lambda *_, **kw: kw["payment_id"],
    idempotency_ttl_seconds=3600,
)
def charge_card(*, payment_id: str, amount_cents: int) -> dict:
    return billing.charge(payment_id, amount_cents)

idempotency_key accepts either a static string or a callable invoked with (*args, **kwargs). If it returns None or empty the call falls through and executes normally.

Idempotency is scoped per run by default. If the SDK isn’t inside an ExecutionContext, the cache scope falls back to a sentinel __no_run__ bucket. Pair with ExecutionContext to keep cached results from leaking across logical runs.

The cache is in-process and thread-safe. For distributed deduplication you currently still need an upstream store (Redis, the platform’s commit-repeat detector); a shared backing store is on the roadmap.

Putting it together

from agent_sentinel import PolicyConfig, PolicyEngine, guarded_action
from agent_sentinel.guardrails import (
    PIIRule,
    ModerationRule,
    GeminiModerator,
    LoopRule,
)

# Configure budgets / deny lists with the convenience wrapper…
PolicyEngine.configure(
    denied_actions=["delete_production_database"],
    run_budget=2.50,
)
# …then layer guardrails on top by mutating the PolicyConfig directly.
PolicyEngine._config.pii_default_enabled = True
PolicyEngine._config.moderation_rules["post_message"] = ModerationRule(
    moderator=GeminiModerator(),
)
PolicyEngine._config.loop_rules["retry_payment"] = LoopRule(
    threshold=3, window_seconds=15.0,
)

@guarded_action(
    name="send_user_email",
    idempotency_key=lambda *_, **kw: kw["message_id"],
    idempotency_ttl_seconds=900,
)
def send_user_email(*, message_id: str, body: str) -> None:
    mailer.send(body)

Every guardrail is independently toggleable: omit the rule field to disable it. The platform UI surfaces blocks under Console → Interventions with dedicated detail panels per guardrail type.

​How guardrails are configured

​PII detection

​Content moderation

​Loop protection

​Idempotency

​Putting it together

​See also

How guardrails are configured

PII detection

Content moderation

Loop protection

Idempotency

Putting it together

See also