Overview
The Runs Page provides complete visibility into agent execution history. View run details, analyze action sequences, simulate replays at zero cost, and detect non-deterministic behavior.
Run registry
The main view shows all runs with filters:
Filters
By status:
- Running - Currently executing (live updates)
- Completed - Finished successfully
- Failed - Encountered errors
By cost:
- Min cost - Show runs above cost threshold
- Useful for finding expensive runs
By time window:
- 24 hours - Recent runs
- 7 days - Last week
- 30 days - Last month
- All time - Complete history
By agent:
- Filter by agent_id
- Type to search and select
Run cards
Each run displays:
- Run ID - Unique identifier
- Agent - Agent name (link to agent page)
- Status Badge - Running (blue), Completed (green), Failed (red)
- Duration - Execution time
- Actions - Count of actions in run
- Cost - Total USD spent
- Success Rate - % of successful actions
- Started - Timestamp
- Ended - Timestamp (if completed)
Visual indicators:
- 🔵 Running - Animated pulse on status badge
- 🟢 Completed - Static green badge
- 🔴 Failed - Red badge with error icon
Critical intercepts section
At the top, see failed runs that need attention:
- Runs with errors
- Runs exceeding budgets
- Runs with high intervention rates
- Sorted by recency
Click to view details and debug.
Run details view
Click any run to open the details view with 3 tabs:
Telemetry tab
Shows complete action sequence:
Action list:
- Sequential order of execution
- Action name
- Inputs (expandable JSON)
- Outputs (expandable JSON)
- Cost (USD)
- Duration (ms)
- Outcome (success/error/blocked/replayed)
- Timestamp
Features:
- Search actions - Find by name or content
- Filter by outcome - Show only errors, successes, blocked
- Expand all - View all inputs/outputs at once
- Copy JSON - Copy any input/output to clipboard
Operation inspector sidebar:
Click any action to open inspector showing:
- Full JSON inputs with syntax highlighting
- Full JSON outputs with syntax highlighting
- Metadata (UUID, timestamp, tags)
- Error details (if failed)
- Compliance metadata (if applicable)
Example action:
{
"action": "call_llm",
"inputs": {
"prompt": "What is 2+2?",
"model": "gpt-4o",
"max_tokens": 100
},
"outputs": {
"response": "2 + 2 equals 4.",
"tokens": 15,
"cost_usd": 0.0008
},
"duration_ns": 523000000,
"outcome": "success"
}
Link to Activity Ledger:
Click “View in Activity Ledger” to see all actions in filterable table.
Simulation tab
Zero-cost replay - Re-execute the run using recorded outputs without calling external APIs.
How it works:
- Click “Simulate Replay”
- Platform replays run action-by-action
- Uses cached outputs from original run
- Detects divergences (different inputs/outputs)
- Calculates potential cost savings
Replay results:
| Metric | Description |
|---|
| Original cost | What the run cost originally |
| Replay cost | Cost to replay (usually $0) |
| Savings | USD saved by using replay |
| Divergences | Count of mismatches |
| Match rate | % of actions that matched |
Divergence details:
When replay diverges from original:
- Action name mismatch - Different action was called
- Input mismatch - Same action, different inputs
- Output mismatch - Same inputs, different outputs (non-deterministic!)
Comparison table:
| Action | Original | Replay | Status |
|---|
| call_llm | input: “Hello” | input: “Hello” | ✅ Match |
| call_llm | output: “Hi there!“ | output: “Hi there!” | ✅ Match |
| generate_id | output: “uuid-123” | output: “uuid-456” | ❌ Diverged |
Use cases:
- Debugging - Replay failed runs to understand what happened
- Testing - Replay with different logic to test fixes
- Cost savings - Use replay for development/testing without API costs
- Compliance - Demonstrate reproducibility
Determinism tab
Automated analysis of non-deterministic behavior.
Determinism score:
- 0-100 scale (higher is better)
- Color-coded progress bar:
- 🟢 90-100 - Highly deterministic
- 🟡 70-89 - Moderately deterministic
- 🟠 50-69 - Low determinism
- 🔴 0-49 - Highly non-deterministic
What’s analyzed:
Platform scans for common non-deterministic patterns:
- Timestamps - Current time in outputs
- Random values - Random numbers, UUIDs
- External state - Database queries, API calls without idempotency
- Floating point - Precision differences
- Ordering - Unordered collections (sets, dicts in Python < 3.7)
Issues found:
List of detected issues with:
- Action name - Where issue occurred
- Issue type - What kind of non-determinism
- Severity - Critical, High, Medium, Low
- Description - What was detected
- Line number - If applicable
Example:
Issue: Timestamp in output
Action: generate_report
Severity: High
Description: Output contains current timestamp ("2024-12-28T14:30:00Z")
which will differ on replay.
Recommendation: Pass timestamp as input parameter instead.
Recommendations:
Platform provides actionable advice:
- How to fix each issue
- Code snippets showing before/after
- Links to documentation
- Best practices for determinism
Benefits of high determinism:
- Easier debugging (reproducible errors)
- Lower testing costs (replay instead of re-execute)
- Compliance (demonstrate reproducibility)
- Reliability (predictable behavior)
Real-time updates
Live run monitoring:
When viewing a running run:
- Action list updates in real-time as actions complete
- Stats refresh (cost, duration, action count)
- Progress indicator shows completion %
- Notifications when run completes or fails
WebSocket events:
run_created - New run appears in list
action_created - New action appears in telemetry
run_completed - Status changes to completed
run_failed - Status changes to failed
Common workflows
Debug a failed run
- Go to Critical Intercepts section
- Click failed run
- Open Telemetry tab
- Find first failed action (red badge)
- Open operation inspector
- Review error details and inputs
- Identify root cause
- Fix agent logic
- Use Simulation tab to test fix with replay
Analyze high-cost runs
- Filter by min cost > $10
- Sort by cost (descending)
- For each expensive run:
- Open telemetry
- Find most expensive actions
- Identify optimization opportunities
- Implement caching, cheaper models, or rate limiting
- Compare costs before/after
Test determinism
- Select a completed run
- Go to Simulation tab
- Click “Simulate Replay”
- Review divergences
- Go to Determinism tab
- Read recommendations
- Fix non-deterministic code
- Re-run and verify score improves
Review agent behavior
- Filter by agent_id
- Sort by recency
- Review recent runs for patterns:
- Success rates trending down?
- Costs increasing?
- New error types appearing?
- Investigate anomalies
- Adjust agent logic or policies
Export runs
Click Export to download run data:
- Respects current filters
- Choose format: CSV, JSON, JSONL
- Includes runs and all actions
- Use for analysis, reporting, compliance
Best practices
Monitor failed runs daily: Check Critical Intercepts section daily to catch issues early.
Use replay for debugging: Replay is free and gives identical behavior - perfect for debugging without re-running expensive LLM calls.
Non-determinism hurts debugging: Runs with determinism scores < 70 are hard to debug. Prioritize fixing non-deterministic code.
Archive old runs: Runs older than 90 days may be archived - export important runs for long-term storage.
Keyboard shortcuts
r - Simulate replay (when viewing run)
t - Switch to Telemetry tab
s - Switch to Simulation tab
d - Switch to Determinism tab
/ - Focus search
Troubleshooting
”Replay not available”
- Replay requires original run to be complete
- Outputs must be stored (check retention policy)
- Some actions may not support replay (side effects)
“Can’t find my run”
- Check time window filter (may be outside range)
- Verify agent_id is correct
- Check if run was in different organization
- Try “All time” filter
”Determinism score seems wrong”
- Score is based on automated analysis - may miss issues
- Use Simulation tab to manually check divergences
- Some non-determinism is acceptable (e.g., timestamps in logs)
- May be too large (> 1MB) - check raw data
- May contain binary data - not displayable
- Check if PII redaction is enabled
See also