The Problem
Your team has judgment about what the agent should do. Turning that judgment into evals and code fixes is hard. It's manual, inconsistent, and changes quickly.
The Solution
We turn failures into evals and fixes from one interface. Talk to your traces to capture your team's judgment and ship changes fast.
30 second setup
Into your repo with a few lines of code
$ poetry add simforge-pyfrom simforge import Simforge
client = Simforge(api_key=os.getenv("SIMFORGE_API_KEY"))
result = client.call("title-chat", user_query="What is the capital of France?")Works where you work
Integrated into your IDE