Overview
The Determinism Analysis feature helps you identify and fix non-deterministic behavior in your agents. Available directly in the console, it performs comprehensive static analysis on any run to detect patterns that would cause different results on replay.
Accessing the analyzer
- Navigate to the Runs page
- Click on any run to open its details modal
- Switch to the Determinism tab (microscope icon) alongside Telemetry and Simulation
- The analysis runs automatically against the run’s recorded actions
Determinism score
The analysis provides a score from 0-100 indicating replay reliability:
Score ranges:
- 🟢 90-100 - High Reliability: Minimal issues, safe to replay
- 🟡 70-89 - Moderate Risk: Some non-deterministic patterns detected
- 🔴 0-69 - High Risk: Significant non-determinism, replay likely to fail
The score appears prominently with:
- Large numeric display
- Color-coded badge
- Progress bar visualization
- Severity label
Analysis summary
Key metrics at a glance:
- Total actions analyzed
- Issues found count
- Summary description of findings
- Days elapsed since run
Patterns detected
The analyzer categorizes issues by type:
Timestamp dependencies
Actions that use current time:
datetime.now()
time.time()
- Date/time-based logic
Impact: Different timestamps on each run cause different behavior.
Random generation
Unseeded random values:
random.random()
random.randint()
random.choice()
Impact: Stochastic results vary per execution.
UUID generation
Auto-generated identifiers:
uuid.uuid4()
- Dynamic ID creation
- Non-deterministic keys
Impact: Different IDs on each run break replay.
Process/environment
System-specific values:
os.getpid()
socket.gethostname()
- Platform details
Impact: Behavior varies across systems.
API variability
Dynamic service responses:
- Request IDs
- Trace IDs
- Session tokens
Impact: External services return different metadata.
Order-dependent
Undefined ordering:
- Dict iteration (old Python)
- Set operations
- Concurrent operations
Impact: Execution order varies per run.
Detailed issues view
Each issue shows:
Action context
- Action name
- Action index (position in run)
- Action ID (for reference)
Issue details
- Type - Category of issue
- Severity - High/Medium/Low
- Description - What the problem is
- Recommendation - How to fix it
Example:
Action: generate_report (#3)
├─ Type: timestamp_dependency
├─ Severity: HIGH
├─ Description: Action uses timestamp or current time in inputs
└─ Recommendation: Use fixed timestamps for testing or mock time functions
Smart recommendations
AI-powered suggestions with:
Priority levels
- 🔴 High - Fix immediately (blocks reliable replay)
- 🟡 Medium - Fix soon (likely causes issues)
- 🟢 Low - Fix eventually (may cause issues)
Categories
- Determinism - Non-deterministic patterns
- Performance - Optimization opportunities
- Reliability - Error handling improvements
Detailed guidance
Each recommendation includes:
- Title - Brief summary
- Description - Full explanation
- Solution - Step-by-step fix
- Code example - Concrete implementation
Example recommendation:
🔴 HIGH PRIORITY | Determinism
Remove timestamp dependencies
Description:
2 actions use current timestamps which will vary on each execution
Solution:
Use fixed timestamps for testing or inject time via parameters
Code Example:
# Instead of:
report_time = datetime.now()
# Use:
report_time = datetime(2025, 1, 3, 12, 0, 0)
Warnings section
Additional context about the run:
- Failed actions
- Error outcomes
- Incomplete executions
- Data quality issues
Visual features
The analysis UI includes:
Progress bar
Visual representation of determinism score:
- Green for high scores
- Yellow for moderate
- Red for low scores
Badge system
Quick visual indicators:
- Severity badges (info/warning/critical)
- Status badges (passed/failed)
- Category tags
Tabbed interface
Toggle between:
- Recommendations tab - Prioritized fixes
- Issues tab - Detailed problems by action
Collapsible sections
Expand/collapse:
- Individual issues
- Code examples
- Action groups
Common patterns and fixes
Pattern: Using current time
❌ Non-deterministic:
@guarded_action(name="log_event")
def log_event(message: str):
timestamp = datetime.now()
return {"message": message, "time": timestamp}
✅ Deterministic:
@guarded_action(name="log_event")
def log_event(message: str, event_time: datetime):
return {"message": message, "time": event_time}
Pattern: Random sampling
❌ Non-deterministic:
@guarded_action(name="select_sample")
def select_sample(items: list):
return random.sample(items, 5)
✅ Deterministic:
@guarded_action(name="select_sample")
def select_sample(items: list, seed: int = 42):
random.seed(seed)
return random.sample(items, 5)
Pattern: UUID generation
❌ Non-deterministic:
@guarded_action(name="create_task")
def create_task(name: str):
task_id = str(uuid.uuid4())
return {"id": task_id, "name": name}
✅ Deterministic:
@guarded_action(name="create_task")
def create_task(name: str, task_id: str):
return {"id": task_id, "name": name}
Integration with replay
Determinism analysis complements the SDK’s Replay Mode:
Workflow:
- Run agent normally → generates ledger
- Analyze determinism in console → identifies issues
- Fix issues following recommendations
- Replay with SDK → validates fixes
- Compare original vs replay → confirms determinism
Exporting results
Download analysis for:
- Code review - Share with team
- Documentation - Track improvements
- CI/CD integration - Automated checks
- Reporting - Management visibility
Export formats:
- JSON (for automation)
- PDF (for reports)
- CSV (for tracking)
Best practices
Analyze before deploying: Check determinism of new agents before production deployment.
Fix high-priority first: Timestamp and random dependencies have the biggest impact - start there.
Regular checks: Run analysis weekly or after significant changes.
Static analysis has limits: The analyzer can’t detect all issues (e.g., race conditions). Use with SDK replay for full validation.
Share results: Export and share analyses with your team for collective improvement.
Keyboard shortcuts
a - Toggle analysis view
r - Show recommendations
i - Show issues
e - Export results
Common questions
”Score seems low but my agent works fine”
A low score means the agent isn’t reproducible, not that it doesn’t work. For testing, debugging, and compliance, you need deterministic behavior.
”Can I ignore low-severity issues?”
Low-severity issues may cause problems depending on your use case:
- For testing/debugging: Fix them
- For production monitoring: Can usually ignore
- For compliance: May need to fix (check requirements)
“Analysis shows false positives”
The analyzer uses heuristics and may flag legitimate patterns. Review each recommendation and decide if it applies to your use case.
”How often should I run analysis?”
Recommended frequency:
- Development: Before each commit
- Staging: Before each deployment
- Production: Weekly or after incidents
Troubleshooting
Analysis fails to load
- Verify run exists and you have access
- Check run has completed (not in progress)
- Try refreshing the page
- Check browser console for errors
Issues seem incorrect
The analyzer looks for patterns that typically indicate non-determinism. Review the specific issue and decide if it applies to your code.
No recommendations shown
If score is 90+, there may be no recommendations. The analyzer only suggests fixes for detected issues.
Analysis takes too long
Large runs (1000+ actions) may take 10-30 seconds to analyze. Be patient or try analyzing smaller runs first.
See also