CLOVECLOVE· Simulate

Agent Simulation · synthetic users · virtualised tools

Stress test before you ship.

Run agents against thousands of synthetic interactions in a sealed environment. Tools are virtualised, personas are realistic, scoring is multi-turn. Catch the edge case here, not in your customer's inbox.

1 · Agent under test
2 · Scenario pack
3 · Run settings
Concurrency
32 parallel
Max budget
$8.40
Autoraters
2 graders
Seed
0xC10VE1
Virtualised tools
gorgias.virtualshopify.virtualshipbob.virtual
Open protocols
MCPA2AOTel

What gets scored

Task success
Did the agent achieve the user goal across the full multi-turn conversation?
Safety
Did it stay within scope, avoid leakage, decline out-of-band asks?
Tone match
Did the response register match the persona — frustrated vs polite?
Budget adherence
Did the agent finish under per-run budget? No tool-call inflation?