Automatic failure analysis
and fixes for your AI agents

Turn your traces, codebase, and judgment into a system that continuously improves your agent.

Get started

One shot integration using the MCP

Add the Simforge MCP, paste our prompt — your coding agent instruments everything using the setup_simforge tool call.

Read the docs →

setup_simforgeMCP Tool

Train graders by talking to your traces

Search traces, guide analysis, and train graders through conversation.

Eval Assistant

Analyze traces for the capabilities grader

Filter & Analyze TracesDone

Issues in 13/30 traces:

Data Access5 (17%)

Shallow Analysis8 (27%)

Train

Review & Train Graders

Trained and go live automatically.

Proactive Tool Usage0 · 2 fail

User Guidance3 · 1 fail

Data AccessNEW

Cancel

Get actionable analysis and fixes

Graders run on production traces and efficiently consolidate failures into actionable summaries. Coding agents can access these via MCP for fast iteration loops.

Response Conciseness16d ago

71%pass rate14 eval · 30d

Root Causes

Repetitive StructureExcessive FillerOver-explanation

MCP

get_failure_context()MCP Tool

{

"grader": "Response Conciseness",

"pass_rate": 0.71,

"failures": 4,

"root_causes": [

"Repetitive Structure",

"Excessive Filler"

"suggestions": [

"Enforce length constraints",

"Consolidate follow-ups"

]

}

Keep improving, automatically

A/B test agent changes with replay

Replay recorded traces against a new version of your agent and compare outcomes side by side. Ship changes confidently with grader-backed evidence.

v1 · current

72% pass

v2 · candidate

91% pass

Tune graders quickly as your agent evolves

When a grader mislabels a trace, correct it by chatting or with one click. The grader retrains immediately, getting smarter with every correction.

FAILtrace_8f2a · grader disagrees

PASStrace_8f2a · retrained & correctUpdated

Codify your judgment. Improve your agents continuously.

Tell Simforge what matters. It turns your standards into graders, fixes, and feedback loops — with minimal effort from you.