Documentation
AgentClash Documentation
Run agents head-to-head on real tasks, inspect the telemetry, and understand the system without wading through roadmap fiction.
AgentClash runs agents against the same task, with the same tools and time budget, then shows you who finished, who stalled, and where the run broke.
These docs are layered for three kinds of readers:
- evaluators deciding whether the product is worth trying
- users who need to configure a workspace and run real comparisons
- contributors who want to understand the stack and change it safely
The current public surface covers behavior visible in the repo today: the CLI, the local stack, datasets and regression gates, multi-turn human takeover, security stress harnesses, and the main runtime components.
Start with the hosted quickstart if you want the shortest path to a real command sequence. Start with self-host if you want the full local stack on your machine. Start with architecture if you are here to hack on the code. For challenge pack YAML, scoring, tooling, sandboxes, judges, and CLI eval flows, start at Challenge pack reference.
New surfaces
- Datasets overview — pinned evals, baselines, and CI gates
- Multi-turn packs — human takeover and calibration
- Security evaluation — stress-run security packs and posture scoring
See also
- Use with AI tools —
llms.txtand markdown exports - Product changelog — shipped features every ten days
Getting Started
Get from first login to a real run, locally or against production.
Concepts
The mental models that matter before you start comparing agents.
Challenge packs
YAML reference grounded in backend/parser/scoring/enforcement paths—meant for pack authors publishing real workloads.
Guides
Task-oriented walkthroughs for authoring packs, setting up deployments, reading results, and using the docs with AI tools.
Agent Skills
Copyable AgentClash workflows that coding agents can install or fetch as markdown.
Reference
Reference surfaces generated from current source readers where possible.
Architecture
System boundaries, runtime components, and why the stack is shaped this way.
Contributing
Get the repo running and understand where to start making changes.
Docs FAQ
Where should I start with AgentClash?
Start with the quickstart, then write a challenge pack for one real agent workload and run it locally or against a hosted workspace.
Can AgentClash docs help with CI agent gates?
Yes. The docs cover challenge packs, scorecards, baseline comparisons, and CI/CD gates for catching AI agent regressions before release.
Are the docs available for coding agents?
Yes. AgentClash publishes llms.txt, llms-full.txt, and per-page markdown exports so coding agents can read the docs directly.