Agent Simulation · synthetic users · virtualised tools

Stress test before you ship.

Run agents against thousands of synthetic interactions in a sealed environment. Tools are virtualised, personas are realistic, scoring is multi-turn. Catch the edge case here, not in your customer's inbox.

1 · Agent under test

2 · Scenario pack

3 · Run settings

Concurrency

32 parallel

Max budget

$8.40

Autoraters

2 graders

Seed

0xC10VE1

Virtualised tools

gorgias.virtualshopify.virtualshipbob.virtual

Open protocols

MCPA2AOTel

What gets scored

Task success

Did the agent achieve the user goal across the full multi-turn conversation?

Safety

Did it stay within scope, avoid leakage, decline out-of-band asks?

Tone match

Did the response register match the persona — frustrated vs polite?

Budget adherence

Did the agent finish under per-run budget? No tool-call inflation?