How is this different from the Team plan?

Team is product access in your workspace. Services are fixed-scope paid engagements where our team builds packs, baselines, or gates with you. Many teams start on Free or Team and add a 2-week Benchmark & Gate Setup sprint.

Do we keep the challenge packs after the engagement?

Yes. All packs, baselines, scorecards, and gate configs live in your workspace. You can extend them without us.

Can we self-host instead of hosted AgentClash?

Yes. AgentClash is MIT-licensed. We can deliver packs and gate templates against your self-hosted stack. Discuss deployment during discovery.

What do you need before discovery?

A short description of the agent workflow, one or two real failure examples, and who signs the release decision. We handle the rest on the first call.

No. The four packages above are paid, fixed-scope engagements. The discovery call is free. Product access starts on the hosted Free plan or paid tiers on the pricing page.

AgentClash

Eval program

First governed benchmark in 2 weeks

AgentClash is the platform. Our team gets you from live agents to frozen challenge packs, baseline evidence, and CI gates in your workspace. Fixed offerings, not open-ended consulting.

These packages are paid engagements, scoped after discovery. The hosted Free plan is self-serve product access if you want to start on your own.

Email hello@agentclash.dev

Offerings

Four fixed packages

Each package is a paid, fixed-scope engagement that ends with customer-owned artifacts in your workspace. We quote scope on discovery; no public rate card.

Eval Discovery
1 week
Audit your agents, document 5 concrete failure modes, and deliver a prioritized challenge pack roadmap.
Best when: You have agents in production but no frozen benchmark yet.
Email about Eval Discovery
Challenge Pack Build
2 to 4 weeks
Ship 3–10 custom challenge packs from your real workflows, scored and versioned in your workspace.
Best when: You know what to test and need packs built from live tasks and tools.
Email about Challenge Pack Build
Benchmark & Gate Setup
2 weeks
Run a baseline, wire a CI release gate, and hand off an executive scorecard template your committee can defend.
Best when: You have packs and need a governed release decision in CI.
Email about Benchmark & Gate Setup
Managed Eval Retainer
Monthly
Release benchmarks on every ship candidate plus a monthly reliability report with regression trends.
Best when: You ship agents often and want ongoing benchmark coverage without building an eval team.
Email about Managed Eval Retainer

Guardrails

Platform adoption, not a consulting shop

Every engagement produces artifacts in your AgentClash workspace: packs, baselines, gates, or CI handoff.
We do not run black-box evals outside your tenancy. You own the packs and the evidence.
Services are paid engagements. The hosted Free plan is still the default path if you want to run evals yourself first.
No vague SOWs. Each package above has a fixed duration and named deliverable.

Discovery intake

What we capture on the first call

A free 30-minute discovery maps your agents to the right package and quotes scope. Bring what you have; we structure the benchmark plan.

Agent workflow and tools in scope
Recent failure examples or incident tickets
Compliance, residency, or policy constraints
Target release decision and stakeholders
Current eval or observability tooling
Success criteria for the first governed benchmark

Explore

Product paths

FAQ

Eval services questions

How is this different from the Team plan?: Team is product access in your workspace. Services are fixed-scope paid engagements where our team builds packs, baselines, or gates with you. Many teams start on Free or Team and add a 2-week Benchmark & Gate Setup sprint.
Do we keep the challenge packs after the engagement?: Yes. All packs, baselines, scorecards, and gate configs live in your workspace. You can extend them without us.
Can we self-host instead of hosted AgentClash?: Yes. AgentClash is MIT-licensed. We can deliver packs and gate templates against your self-hosted stack. Discuss deployment during discovery.
What do you need before discovery?: A short description of the agent workflow, one or two real failure examples, and who signs the release decision. We handle the rest on the first call.
Are services free?: No. The four packages above are paid, fixed-scope engagements. The discovery call is free. Product access starts on the hosted Free plan or paid tiers on the pricing page.

Book a discovery call

Tell us about your agents and release process. We will recommend a package and timeline, or point you to the Free plan if that is the better first step.

Email hello@agentclash.dev

First governed benchmark in 2 weeks

Four fixed packages

Eval Discovery

Challenge Pack Build

Benchmark & Gate Setup

Managed Eval Retainer

Platform adoption, not a consulting shop

What we capture on the first call

Product paths

Eval services questions

Book a discovery call