Can we self-host instead of using the hosted product?

Yes. AgentClash is MIT-licensed and open source. Many enterprises start hosted on Free or Team, then move to self-host or a hybrid model. See the self-host guide in docs for the full stack.

Do you mark up LLM tokens?

No. AgentClash is bring-your-own-key (BYOK) on every tier. You connect provider keys and pay vendors directly; we never mark up tokens.

What about data residency for UAE and other regions?

Hosted plans run on our standard cloud regions today. Enterprise contracts can discuss dedicated deployment, private networking, and residency requirements during the architecture review. Contact hello@agentclash.dev.

How is the Team plan different from a services engagement?

Team is product access: your workspace, challenge packs, and gates. Optional hands-on eval sprints (pack build, benchmark setup) are fixed-scope paid services. Ask us about a 2-week eval sprint intro.

AgentClash

Enterprise evaluation

Ship agents with evidence your team can defend

AgentClash turns agent behavior into a release decision: frozen benchmarks, replay, scorecards, and CI gates. Built for platform, security, and vendor review committees, not another trace dashboard.

Email hello@agentclash.dev

MIT open source
Bring your own keys
Freemium hosted plan
No token markup

Release committee

Five questions every agent release needs answered

Platform leads do not need another leaderboard. They need governed evidence that connects benchmark, replay, gate, and the decision to ship or block.

01
Which agent should we trust?
Compare baseline, candidate, and vendor agents inside one frozen challenge pack, not disconnected eval jobs.
02
Under which constraints?
Attach latency, cost, and policy ceilings to the benchmark. The run fails when a candidate breaks your release rules.
03
At what cost?
See cost per successful task next to correctness and reliability, not token spend in isolation.
04
Why did it fail?
Replay shows routing, tool paths, artifacts, and scorecard axes. Not another log dump.
05
Can we defend the decision?
Export pass and fail recommendations, scorecards, and redacted evidence for security, finance, and engineering leadership.

How it works

From live run to release gate in one system

No stitching together traces, eval spreadsheets, and policy docs in separate tools. AgentClash produces one decision artifact your team can gate on.

Enterprise tier

Compliance, SSO, dedicated support, and paid rollout help.

Everything in Team, plus:
SSO / SAML
Org-wide audit logs
Unlimited replay retention
99.9% uptime SLA
Dedicated support channel
Custom MSA / billing terms

View full pricing

01
Freeze the benchmark
Version challenge packs and inputs so every run compares against the same approved workload.
02
Run candidates
Run agents in a sandbox with the same tools, time budget, and scoring rules.
03
Review replay evidence
Inspect trajectories, artifacts, cost, and scorecards before anyone argues from anecdotes.
04
Gate the release
Fail CI when a candidate regresses against baseline on the scorecard your team already trusts.

Rollout offer

Start free, then roll into Team when the benchmark matters

Run the first evals on Free. When you need more run volume, replay retention, CI integration, and workspace audit logs, upgrade the same workspace to Team.

Dedicated workspace on the Team tier
Challenge packs and replay retention
CI integration and audit logs
Architecture review with our team

Start free Ask about a 2-week eval sprint

Team is self-serve product access for active evaluation programs. Fixed-scope eval sprints are optional services packages we scope on the architecture review.

Explore

Related resources

FAQ

Enterprise evaluation questions

Can we self-host instead of using the hosted product?: Yes. AgentClash is MIT-licensed and open source. Many enterprises start hosted on Free or Team, then move to self-host or a hybrid model. See the self-host guide in docs for the full stack.
Do you mark up LLM tokens?: No. AgentClash is bring-your-own-key (BYOK) on every tier. You connect provider keys and pay vendors directly; we never mark up tokens.
What about data residency for UAE and other regions?: Hosted plans run on our standard cloud regions today. Enterprise contracts can discuss dedicated deployment, private networking, and residency requirements during the architecture review. Contact hello@agentclash.dev.
How is the Team plan different from a services engagement?: Team is product access: your workspace, challenge packs, and gates. Optional hands-on eval sprints (pack build, benchmark setup) are fixed-scope paid services. Ask us about a 2-week eval sprint intro.

Ready to gate your next agent release?

Book a 30-minute eval architecture review, or email us to scope a Team rollout on your workloads.

Email hello@agentclash.dev

Ship agents with evidence your team can defend

Five questions every agent release needs answered

Which agent should we trust?

Under which constraints?

At what cost?

Why did it fail?

Can we defend the decision?

From live run to release gate in one system

Freeze the benchmark

Run candidates

Review replay evidence

Gate the release

Start free, then roll into Team when the benchmark matters

Related resources

Enterprise evaluation questions

Ready to gate your next agent release?