Skip to main content

Overview

Agent Sentinel provides transparent instrumentation for major LLM providers. Simply call instrument_<provider>() once, and all LLM API calls are automatically tracked with:
  • Token usage (input and output tokens)
  • Costs (calculated from latest pricing data)
  • Duration and latency
  • Model information
  • Request/response metadata
No code changes required - just instrument and go.

Supported providers

  • OpenAI (GPT-3.5, GPT-4, GPT-4o, o1)
  • Anthropic (Claude 3 Opus, Sonnet, Haiku, Claude 3.5)
  • xAI/Grok (Grok models)
  • Google Gemini (Gemini models)

OpenAI instrumentation

from agent_sentinel.integrations import instrument_openai
from openai import OpenAI

# Instrument OpenAI (one-time setup)
instrument_openai()

# Use OpenAI normally - automatically tracked
client = OpenAI(api_key="sk-...")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is 2+2?"}]
)

# Automatically logged with:
# - action_name: "openai_chat_completion"
# - cost: $0.015 (calculated from token usage)
# - input_tokens: 10
# - output_tokens: 5
# - model: "gpt-4o"
# - duration: ~500ms

Async OpenAI

from openai import AsyncOpenAI

client = AsyncOpenAI(api_key="sk-...")

# Automatically tracked
response = await client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

Anthropic instrumentation

from agent_sentinel.integrations import instrument_anthropic
from anthropic import Anthropic

# Instrument Anthropic (one-time setup)
instrument_anthropic()

# Use Anthropic normally - automatically tracked
client = Anthropic(api_key="sk-ant-...")

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "What is 2+2?"}]
)

# Automatically logged with:
# - action_name: "anthropic_messages_create"
# - cost: $0.008 (calculated from token usage)
# - input_tokens: 10
# - output_tokens: 5
# - model: "claude-3-5-sonnet-20241022"

Async Anthropic

from anthropic import AsyncAnthropic

client = AsyncAnthropic(api_key="sk-ant-...")

response = await client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)

Grok instrumentation

from agent_sentinel.integrations import instrument_grok
from openai import OpenAI  # Grok uses OpenAI SDK

# Instrument Grok (one-time setup)
instrument_grok()

# Use Grok via OpenAI SDK
client = OpenAI(
    api_key="xai-...",
    base_url="https://api.x.ai/v1"
)

response = client.chat.completions.create(
    model="grok-beta",
    messages=[{"role": "user", "content": "What is 2+2?"}]
)

Gemini instrumentation

from agent_sentinel.integrations import instrument_gemini
import google.generativeai as genai

# Instrument Gemini (one-time setup)
instrument_gemini()

# Use Gemini normally
genai.configure(api_key="AIza...")
model = genai.GenerativeModel('gemini-pro')

response = model.generate_content("What is 2+2?")

Getting token costs

Retrieve cumulative costs across all instrumented LLM calls:
from agent_sentinel.integrations import get_token_costs

# Get total costs by provider
costs = get_token_costs()

print(f"OpenAI: ${costs['openai']:.4f}")
print(f"Anthropic: ${costs['anthropic']:.4f}")
print(f"Grok: ${costs['grok']:.4f}")
print(f"Gemini: ${costs['gemini']:.4f}")
print(f"Total: ${costs['total']:.4f}")

Pricing data

Agent Sentinel includes up-to-date pricing (as of December 2025):

OpenAI pricing

ModelInput (per 1M tokens)Output (per 1M tokens)
gpt-4o$5.00$15.00
gpt-4o-mini$0.15$0.60
gpt-4-turbo$10.00$30.00
gpt-4$30.00$60.00
gpt-4-32k$60.00$120.00
gpt-3.5-turbo$0.50$1.50
o1-preview$15.00$60.00
o1-mini$3.00$12.00
text-embedding-3-small$0.02-
text-embedding-3-large$0.13-

Anthropic pricing

ModelInput (per 1M tokens)Output (per 1M tokens)
claude-3-5-sonnet-20241022$3.00$15.00
claude-3-5-sonnet-20240620$3.00$15.00
claude-3-opus-20240229$15.00$75.00
claude-3-sonnet-20240229$3.00$15.00
claude-3-haiku-20240307$0.25$1.25
claude-2.1$8.00$24.00
claude-instant-1.2$0.80$2.40

Grok pricing (xAI)

ModelInput (per 1M tokens)Output (per 1M tokens)
grok-1$5.00$15.00
grok-beta$5.00$15.00

Gemini pricing

ModelInput (per 1M tokens)Output (per 1M tokens)
gemini-1.5-pro$3.50$10.50
gemini-1.5-flash$0.075$0.30
gemini-1.5-flash-8b$0.0375$0.15
gemini-1.0-pro$0.50$1.50

Advanced: Custom pricing

Override default pricing for custom models:
from agent_sentinel.integrations.pricing import ModelPricing, PRICING_DATA

# Add custom model pricing
PRICING_DATA["openai"]["my-custom-model"] = ModelPricing(
    input_price_per_1m=5.0,
    output_price_per_1m=20.0
)

Combining with @guarded_action

LLM instrumentation works alongside manual instrumentation:
from agent_sentinel import guarded_action
from agent_sentinel.integrations import instrument_openai
from openai import OpenAI

instrument_openai()
client = OpenAI()

@guarded_action(name="agent_reasoning", cost_usd=0.0)
def run_agent(task):
    # Manual action wrapping
    result = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": task}]
    )
    # LLM call automatically tracked WITHIN this action
    return result.choices[0].message.content

# Ledger will show:
# 1. "agent_reasoning" action (manual)
# 2. "openai_chat_completion" action (automatic, nested)

Policy enforcement on LLM calls

LLM calls respect policy budgets:
from agent_sentinel import PolicyEngine
from agent_sentinel.integrations import instrument_openai
from openai import OpenAI

# Set budget
PolicyEngine.configure(session_budget=1.0)

instrument_openai()
client = OpenAI()

# First few calls succeed
for i in range(10):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Task {i}"}]
    )
    # Each call costs ~$0.10
    # After 10 calls, BudgetExceededError is raised

Ledger output example

With LLM instrumentation enabled, your ledger includes detailed LLM call records:
{
  "uuid": "a1b2c3d4...",
  "timestamp": "2024-12-28T10:30:00.123456Z",
  "action": "openai_chat_completion",
  "cost_usd": 0.0155,
  "duration_ns": 523000000,
  "outcome": "success",
  "tags": ["llm", "openai"],
  "inputs": {
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "What is 2+2?"}],
    "max_tokens": 1024
  },
  "outputs": {
    "content": "2 + 2 equals 4.",
    "input_tokens": 10,
    "output_tokens": 5,
    "total_tokens": 15,
    "finish_reason": "stop"
  }
}

Best practices

Instrument at startup: Call instrument_<provider>() once at application startup, before creating any LLM clients.
Use with budgets: Combine LLM instrumentation with session/run budgets to prevent runaway costs.
Pricing data may lag: Pricing is updated regularly but may not reflect same-day price changes. Override with custom pricing if needed.
Track costs by provider: Use get_token_costs() to see which providers are driving your costs.

Troubleshooting

”LLM calls not tracked”

Make sure to call instrument_<provider>() before creating the client:
# ✅ Correct order
instrument_openai()
client = OpenAI()

# ❌ Wrong order
client = OpenAI()
instrument_openai()  # Too late!

“Incorrect costs”

Verify the model name matches our pricing data. Use normalize_model_name() to check:
from agent_sentinel.integrations.pricing import normalize_model_name

print(normalize_model_name("gpt-4o-2024-05-13"))  # "gpt-4o"

See also