Replay (platform) - Agent Sentinel

The platform includes endpoints that simulate replay and analyze runs for non-determinism.

In v0.1 the platform replay endpoint does not execute your code. For actual replay execution, use the SDK (Replay).

Endpoint summary

Method	Path	Purpose
`POST`	`/api/v1/replay/{run_id}`	Static analysis of recorded inputs (no execution)
`GET`	`/api/v1/replay/analysis/{run_id}`	Comprehensive determinism score + recommendations
`GET`	`/api/v1/replay/compare/{run_id}`	Original run + analysis side-by-side for the console
`GET`	`/api/v1/replay/history/`	Org-wide ledger of local-replay submissions (Replay hub)
`GET`	`/api/v1/replay/history/{run_id}`	Replay submissions for a single run, newest first
`POST`	`/api/v1/replay/{run_id}/results`	SDK posts here after `ReplayMode.__exit__`
`GET`	`/api/v1/replay/{run_id}/bundle`	Download the run as a JSON bundle for local replay
`POST`	`/api/v1/replay/{run_id}/evaluate`	Re-evaluate a recorded run against a different policy (what-if)

Replay a run (simulation)

POST /api/v1/replay/{run_id}

This inspects recorded inputs and flags common sources of non-determinism (timestamps, random values, UUIDs).

Determinism analysis

GET /api/v1/replay/analysis/{run_id}

Returns a comprehensive “determinism score” and actionable recommendations.

What it analyzes

The determinism analyzer performs static analysis on your run’s actions to identify patterns that indicate non-deterministic behavior: Timestamp dependencies

datetime.now(), time.time(), current_time
Actions that depend on current timestamps
Date/time-based logic that varies per execution

Random value generation

random.random(), random.randint(), random.choice()
Unseeded random number generators
Stochastic sampling

UUID generation

uuid.uuid4() and other UUID variants
Auto-generated IDs that change per run
Non-deterministic identifiers

Process/environment dependencies

os.getpid(), process IDs
socket.gethostname(), hostnames
Platform-specific values

API variability

Request IDs, trace IDs, session IDs
Dynamic response fields that change per call
Non-deterministic external service responses

Order-dependent operations

Dictionary iteration (Python < 3.7)
Set operations with undefined order
Concurrent operations without synchronization

Response format

{
  "run_id": "a1b2c3d4-...",
  "total_actions": 42,
  "determinism_score": 85,
  "score_severity": "moderate",
  "issues_found": 6,
  "patterns_detected": {
    "timestamp": 2,
    "random": 1,
    "uuid": 2,
    "process_id": 0,
    "environmental": 0,
    "api_variability": 1,
    "order_dependent": 0
  },
  "issues": [
    {
      "action_index": 3,
      "action_name": "generate_report",
      "action_id": "abc123...",
      "issues": [
        {
          "type": "timestamp_dependency",
          "severity": "high",
          "description": "Action uses timestamp or current time in inputs",
          "recommendation": "Use fixed timestamps for testing or mock time functions"
        }
      ]
    }
  ],
  "warnings": [
    "Action 5 (call_external_api) resulted in error"
  ],
  "recommendations": [
    {
      "priority": "high",
      "category": "determinism",
      "title": "Remove timestamp dependencies",
      "description": "2 actions use current timestamps which will vary on each execution",
      "solution": "Use fixed timestamps for testing or inject time via parameters",
      "code_example": "# Instead of:\nreport_time = datetime.now()\n\n# Use:\nreport_time = datetime(2025, 1, 3, 12, 0, 0)"
    },
    {
      "priority": "medium",
      "category": "random",
      "title": "Seed random number generators",
      "description": "Random values detected without explicit seeding",
      "solution": "Set a seed for reproducible random values",
      "code_example": "import random\nrandom.seed(42)  # Fixed seed for deterministic testing"
    }
  ],
  "summary": "Run has moderate determinism issues. Found 6 issues across 42 actions."
}

Determinism score

The determinism score (0-100) indicates how reliably the run could be replayed:

90-100 🟢 - High reliability, minimal issues
70-89 🟡 - Moderate risk, some non-deterministic patterns
0-69 🔴 - High risk, significant non-determinism

Score calculation:

base_score = 100
penalty_per_issue = 100 / (total_actions + 1)
final_score = max(0, base_score - (issues_found × penalty_per_issue))

Actions with more issues get penalized more heavily.

Issue severity levels

High severity - Definitely causes non-determinism:

Timestamp dependencies
Random value generation
Unseeded randomness

Medium severity - Likely causes issues:

UUID generation
Environment dependencies
Process-specific values

Low severity - May cause issues:

API variability (trace IDs, etc.)
Order-dependent operations

Recommendations format

Each recommendation includes:

Priority - high/medium/low
Category - determinism/performance/reliability
Title - Brief summary
Description - What the issue is
Solution - How to fix it
Code example - Concrete fix (optional)

Using in the console

The determinism analysis is available in the Console UI:

Navigate to Runs page
Click on a specific run to open its details modal
Switch to the Determinism tab (microscope icon)
View the analysis with:
- Determinism score and severity badge
- Patterns detected summary
- Detailed issues by action
- Prioritized recommendations with code examples

Example issues and fixes

Issue: Timestamp dependency

# ❌ Non-deterministic
def generate_report():
    timestamp = datetime.now()
    return f"Report generated at {timestamp}"

# ✅ Deterministic
def generate_report(report_time: datetime):
    return f"Report generated at {report_time}"

Issue: Random generation

# ❌ Non-deterministic
import random
def select_sample():
    return random.choice(items)

# ✅ Deterministic
import random
random.seed(42)
def select_sample():
    return random.choice(items)

Issue: UUID generation

# ❌ Non-deterministic
import uuid
def create_id():
    return str(uuid.uuid4())

# ✅ Deterministic
def create_id(seed: int = 0):
    # Use deterministic ID generation
    return f"id_{seed}_{hash(seed)}"

Limitations

What it CAN detect:

Static patterns in inputs/outputs
Known non-deterministic keywords
Common anti-patterns

What it CANNOT detect:

Non-deterministic behavior in external services
Race conditions in concurrent code
Hardware-dependent behavior
Implicit state dependencies

For actual replay validation, use the SDK’s Replay Mode.

Local replay submissions

After the SDK finishes a local replay (via replay_mode(...)), it posts a summary back to the platform:

POST /api/v1/replay/{run_id}/results
{
  "status": "success",          // success | partial | failed
  "actions_replayed": 12,
  "actions_total": 12,
  "divergences_detected": 0,
  "divergences": [],
  "cost_savings": 0.43
}

The submissions show up in the Replay Hub (/api/v1/replay/history/), with per-run drill-down at /api/v1/replay/history/{run_id}.

Run bundle download

GET /api/v1/replay/{run_id}/bundle

Returns a JSON bundle (run_id, agent_id, timestamp, status, total_cost, ordered actions[]) suitable for ReplayMode.from_run_id() to consume locally without re-hitting external APIs.

What-if policy evaluation

POST /api/v1/replay/{run_id}/evaluate
{
  "policy_id": "...",                  // OR
  "policy_override": { /* PolicyCreate-shaped dict */ }
}

Re-runs the recorded actions through evaluate_run_against_policy and reports which would have been blocked by the alternative policy. Used by the console’s “preview new policy” workflow.

Best practices

Run analysis regularly: Check determinism after significant changes to catch issues early.

Fix high-priority issues first: Start with timestamp and random dependencies - they have the biggest impact.

Use with SDK replay: Combine analysis (static) with SDK replay (dynamic) for comprehensive testing.

False positives are possible: The analyzer uses heuristics. Review recommendations and decide what makes sense for your use case.

API usage

# Get determinism analysis for a run
curl -X GET "https://platform.example.com/api/v1/replay/analysis/{run_id}" \
  -H "Authorization: Bearer YOUR_TOKEN"

Integration with CI/CD

Check determinism in your pipeline:

#!/bin/bash
# .github/workflows/test.yml

# Run agent tests
python -m pytest tests/

# Get latest run ID from tests
RUN_ID=$(cat latest_run_id.txt)

# Analyze determinism
SCORE=$(curl -s "https://platform.example.com/api/v1/replay/analysis/$RUN_ID" \
  -H "Authorization: Bearer $TOKEN" | jq -r '.determinism_score')

# Fail if score too low
if [ "$SCORE" -lt 80 ]; then
  echo "Determinism score too low: $SCORE"
  exit 1
fi

echo "Determinism check passed: $SCORE"

​Endpoint summary

​Replay a run (simulation)

​Determinism analysis

​What it analyzes

​Response format

​Determinism score

​Issue severity levels

​Recommendations format

​Using in the console

​Example issues and fixes

​Limitations

​Local replay submissions

​Run bundle download

​What-if policy evaluation

​Best practices

​API usage

​Integration with CI/CD

Endpoint summary

Replay a run (simulation)

Determinism analysis

What it analyzes

Response format

Determinism score

Issue severity levels

Recommendations format

Using in the console

Example issues and fixes

Limitations

Local replay submissions

Run bundle download

What-if policy evaluation

Best practices

API usage

Integration with CI/CD