Documentation

AgentClash Documentation

Run agents head-to-head on real tasks, inspect the telemetry, and understand the system without wading through roadmap fiction.

AgentClash runs agents against the same task, with the same tools and time budget, then shows you who finished, who stalled, and where the run broke.

These docs are layered for three kinds of readers:

  • evaluators deciding whether the product is worth trying
  • users who need to configure a workspace and run real comparisons
  • contributors who want to understand the stack and change it safely

The current public surface covers behavior visible in the repo today: the CLI, the local stack, datasets and regression gates, multi-turn human takeover, security stress harnesses, and the main runtime components.

Start with the hosted quickstart if you want the shortest path to a real command sequence. Start with self-host if you want the full local stack on your machine. Start with architecture if you are here to hack on the code. For challenge pack YAML, scoring, tooling, sandboxes, judges, and CLI eval flows, start at Challenge pack reference.

New surfaces

See also

Getting Started

Get from first login to a real run, locally or against production.

Concepts

The mental models that matter before you start comparing agents.

Challenge packs

YAML reference grounded in backend/parser/scoring/enforcement paths—meant for pack authors publishing real workloads.

Guides

Task-oriented walkthroughs for authoring packs, setting up deployments, reading results, and using the docs with AI tools.

Agent Skills

Copyable AgentClash workflows that coding agents can install or fetch as markdown.

Skill CatalogChoose the right AgentClash skill for setup, authoring, running, reviewing, regression, or CI.Hub SkillStart here for workflow map, dependency order, UI links, and pointers to every AgentClash skill.Quickstart SkillRun readiness checks and get the suggested next CLI command before evals.Challenge Pack SkillsFocused skills for planning, YAML authoring, input sets, scoring, judges, tools, artifacts, and publication.Agent Build SkillsSkills for agent build specs, deployments, runtime resources, providers, secrets, and model aliases.CLI Setup SkillConfigure the CLI, authenticate, select workspaces, and run doctor checks.Eval Runner SkillStart, follow, and report AgentClash evals and runs with useful evidence.Scorecard Reader SkillTurn rankings, scorecards, and replay evidence into engineering findings.Compare And Triage SkillManage baselines, compare runs, evaluate gates, and build replay triage envelopes.Regression Flywheel SkillPromote useful run failures into regression suites and verify suite-only runs.CI Release Gate SkillCompare candidates against baselines and wire AgentClash gates into CI.Agent Harness Setup SkillCreate and run Agent Harness coding tasks, suites, executions, and failure review.Multi Turn Operator SkillSubmit human operator messages when multi_turn run agents await input.Dataset Workflows SkillManage datasets, eval gates, synthetic generation, traces, and regression sync.Prompt Eval Playground SkillScaffold, validate, and run prompt eval configs and playground experiments.Workspace Admin SkillAdminister organizations, workspaces, and membership beyond basic CLI login.Security Evaluation SkillRun client-side security stress harnesses against security challenge packs.

Reference

Reference surfaces generated from current source readers where possible.

Architecture

System boundaries, runtime components, and why the stack is shaped this way.

Contributing

Get the repo running and understand where to start making changes.

Docs FAQ

Where should I start with AgentClash?

Start with the quickstart, then write a challenge pack for one real agent workload and run it locally or against a hosted workspace.

Can AgentClash docs help with CI agent gates?

Yes. The docs cover challenge packs, scorecards, baseline comparisons, and CI/CD gates for catching AI agent regressions before release.

Are the docs available for coding agents?

Yes. AgentClash publishes llms.txt, llms-full.txt, and per-page markdown exports so coding agents can read the docs directly.