Overview
Agent Sentinel provides transparent instrumentation for major LLM providers. Simply call instrument_<provider>() once, and all LLM API calls are automatically tracked with:
- Token usage (input and output tokens)
- Costs (calculated from latest pricing data)
- Duration and latency
- Model information
- Request/response metadata
No code changes required - just instrument and go.
Supported providers
- OpenAI (GPT-3.5, GPT-4, GPT-4o, o1)
- Anthropic (Claude 3 Opus, Sonnet, Haiku, Claude 3.5)
- xAI/Grok (Grok models)
- Google Gemini (Gemini models)
OpenAI instrumentation
from agent_sentinel.integrations import instrument_openai
from openai import OpenAI
# Instrument OpenAI (one-time setup)
instrument_openai()
# Use OpenAI normally - automatically tracked
client = OpenAI(api_key="sk-...")
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is 2+2?"}]
)
# Automatically logged with:
# - action_name: "openai_chat_completion"
# - cost: $0.015 (calculated from token usage)
# - input_tokens: 10
# - output_tokens: 5
# - model: "gpt-4o"
# - duration: ~500ms
Async OpenAI
from openai import AsyncOpenAI
client = AsyncOpenAI(api_key="sk-...")
# Automatically tracked
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}]
)
Anthropic instrumentation
from agent_sentinel.integrations import instrument_anthropic
from anthropic import Anthropic
# Instrument Anthropic (one-time setup)
instrument_anthropic()
# Use Anthropic normally - automatically tracked
client = Anthropic(api_key="sk-ant-...")
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "What is 2+2?"}]
)
# Automatically logged with:
# - action_name: "anthropic_messages_create"
# - cost: $0.008 (calculated from token usage)
# - input_tokens: 10
# - output_tokens: 5
# - model: "claude-3-5-sonnet-20241022"
Async Anthropic
from anthropic import AsyncAnthropic
client = AsyncAnthropic(api_key="sk-ant-...")
response = await client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}]
)
Grok instrumentation
from agent_sentinel.integrations import instrument_grok
from openai import OpenAI # Grok uses OpenAI SDK
# Instrument Grok (one-time setup)
instrument_grok()
# Use Grok via OpenAI SDK
client = OpenAI(
api_key="xai-...",
base_url="https://api.x.ai/v1"
)
response = client.chat.completions.create(
model="grok-beta",
messages=[{"role": "user", "content": "What is 2+2?"}]
)
Gemini instrumentation
from agent_sentinel.integrations import instrument_gemini
import google.generativeai as genai
# Instrument Gemini (one-time setup)
instrument_gemini()
# Use Gemini normally
genai.configure(api_key="AIza...")
model = genai.GenerativeModel('gemini-pro')
response = model.generate_content("What is 2+2?")
Gemini-powered features
Beyond raw call instrumentation, Agent Sentinel uses Gemini (gemini-2.5-flash by default) to power three platform-side features. All three read GEMINI_API_KEY (or GOOGLE_API_KEY) from the platform environment and degrade gracefully when the key or google-genai package is missing.
Semantic moderation
GeminiModerator (in agent_sentinel.guardrails.moderation) classifies action arguments against intent-level categories that keyword matching can’t catch: prompt_injection, data_exfiltration, policy_evasion, plus the explicit violence / self_harm / hate_severe / sexual_minors set.
from agent_sentinel import PolicyEngine
from agent_sentinel.guardrails import ModerationRule, GeminiModerator
PolicyEngine.configure(
moderation_rule=ModerationRule(moderator=GeminiModerator()),
)
See Guardrails → Content moderation for full configuration.
Intervention enrichment
After every intervention is written to the platform DB, a FastAPI background task calls Gemini to replace terse rule-text fields with human-readable explanations:
| Field | Enriched content |
|---|
reason | One or two sentences explaining the policy concern in plain English |
agent_intent | One sentence describing what the agent was likely trying to accomplish |
risk_level | Reassessed severity (minimal / low / medium / high / critical) |
remediation_payload.suggested_rewrite | Short description of how the agent could safely retry, or null |
Enrichment runs out-of-band — the SDK’s POST /api/v1/ingest/ returns immediately and the enriched fields stream into the console asynchronously. See the Console → Interventions guardrail panels for how the enriched fields are surfaced.
Self-repair feedback
When a policy block returns to the LLM through a SentinelToolNode, the tool message includes the enriched suggested_rewrite along with the original retry_guidance. Most frontier models will read the rewrite and self-correct without any extra prompt-engineering.
Adversarial scenario generation
The platform exposes POST /api/v1/evals/generate-scenarios — a Gemini-driven generator that produces adversarial test scenarios across five categories (prompt_injection, policy_evasion, social_engineering, cost_abuse, data_exfiltration). Used by the console’s scenario-generator dialog to expand benchmark coverage. See Console → Evals.
Prose policy compilation
Compile English-language policy descriptions directly to the policy IR via POST /api/v1/policies/compile. See SDK → Prose policies.
Getting token costs
Retrieve cumulative costs across all instrumented LLM calls:
from agent_sentinel.integrations import get_token_costs
# Get total costs by provider
costs = get_token_costs()
print(f"OpenAI: ${costs['openai']:.4f}")
print(f"Anthropic: ${costs['anthropic']:.4f}")
print(f"Grok: ${costs['grok']:.4f}")
print(f"Gemini: ${costs['gemini']:.4f}")
print(f"Total: ${costs['total']:.4f}")
Pricing data
Agent Sentinel includes up-to-date pricing (as of December 2025):
OpenAI pricing
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|
| gpt-4o | $5.00 | $15.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| gpt-4-turbo | $10.00 | $30.00 |
| gpt-4 | $30.00 | $60.00 |
| gpt-4-32k | $60.00 | $120.00 |
| gpt-3.5-turbo | $0.50 | $1.50 |
| o1-preview | $15.00 | $60.00 |
| o1-mini | $3.00 | $12.00 |
| text-embedding-3-small | $0.02 | - |
| text-embedding-3-large | $0.13 | - |
Anthropic pricing
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|
| claude-3-5-sonnet-20241022 | $3.00 | $15.00 |
| claude-3-5-sonnet-20240620 | $3.00 | $15.00 |
| claude-3-opus-20240229 | $15.00 | $75.00 |
| claude-3-sonnet-20240229 | $3.00 | $15.00 |
| claude-3-haiku-20240307 | $0.25 | $1.25 |
| claude-2.1 | $8.00 | $24.00 |
| claude-instant-1.2 | $0.80 | $2.40 |
Grok pricing (xAI)
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|
| grok-1 | $5.00 | $15.00 |
| grok-beta | $5.00 | $15.00 |
Gemini pricing
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|
| gemini-1.5-pro | $3.50 | $10.50 |
| gemini-1.5-flash | $0.075 | $0.30 |
| gemini-1.5-flash-8b | $0.0375 | $0.15 |
| gemini-1.0-pro | $0.50 | $1.50 |
Advanced: Custom pricing
Override default pricing for custom models:
from agent_sentinel.integrations.pricing import ModelPricing, PRICING_DATA
# Add custom model pricing
PRICING_DATA["openai"]["my-custom-model"] = ModelPricing(
input_price_per_1m=5.0,
output_price_per_1m=20.0
)
Combining with @guarded_action
LLM instrumentation works alongside manual instrumentation:
from agent_sentinel import guarded_action
from agent_sentinel.integrations import instrument_openai
from openai import OpenAI
instrument_openai()
client = OpenAI()
@guarded_action(name="agent_reasoning", cost_usd=0.0)
def run_agent(task):
# Manual action wrapping
result = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": task}]
)
# LLM call automatically tracked WITHIN this action
return result.choices[0].message.content
# Ledger will show:
# 1. "agent_reasoning" action (manual)
# 2. "openai_chat_completion" action (automatic, nested)
Policy enforcement on LLM calls
LLM calls respect policy budgets:
from agent_sentinel import PolicyEngine
from agent_sentinel.integrations import instrument_openai
from openai import OpenAI
# Set budget
PolicyEngine.configure(session_budget=1.0)
instrument_openai()
client = OpenAI()
# First few calls succeed
for i in range(10):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Task {i}"}]
)
# Each call costs ~$0.10
# After 10 calls, BudgetExceededError is raised
Ledger output example
With LLM instrumentation enabled, your ledger includes detailed LLM call records:
{
"uuid": "a1b2c3d4...",
"timestamp": "2024-12-28T10:30:00.123456Z",
"action": "openai_chat_completion",
"cost_usd": 0.0155,
"duration_ns": 523000000,
"outcome": "success",
"tags": ["llm", "openai"],
"inputs": {
"model": "gpt-4o",
"messages": [{"role": "user", "content": "What is 2+2?"}],
"max_tokens": 1024
},
"outputs": {
"content": "2 + 2 equals 4.",
"input_tokens": 10,
"output_tokens": 5,
"total_tokens": 15,
"finish_reason": "stop"
}
}
Best practices
Instrument at startup: Call instrument_<provider>() once at application startup, before creating any LLM clients.
Use with budgets: Combine LLM instrumentation with session/run budgets to prevent runaway costs.
Pricing data may lag: Pricing is updated regularly but may not reflect same-day price changes. Override with custom pricing if needed.
Track costs by provider: Use get_token_costs() to see which providers are driving your costs.
Troubleshooting
”LLM calls not tracked”
Make sure to call instrument_<provider>() before creating the client:
# ✅ Correct order
instrument_openai()
client = OpenAI()
# ❌ Wrong order
client = OpenAI()
instrument_openai() # Too late!
“Incorrect costs”
Verify the model name matches our pricing data. Use normalize_model_name() to check:
from agent_sentinel.integrations.pricing import normalize_model_name
print(normalize_model_name("gpt-4o-2024-05-13")) # "gpt-4o"
See also