Agent Skills

Agent Skills

Copyable AgentClash skills for coding agents, exposed as docs pages and markdown exports.

AgentClash ships portable Agent Skills for coding agents that understand the SKILL.md folder format. The canonical source lives in web/content/agent-skills/.../SKILL.md; docs pages and markdown exports are generated from that source.

Install Targets

  • Codex: copy a skill folder into .agents/skills/<skill>/SKILL.md or point Codex at the markdown export.
  • Claude Code: copy a skill folder into .claude/skills/<skill>/SKILL.md; if the repo already uses AGENTS.md, add a CLAUDE.md import for @AGENTS.md.
  • Cursor: use these pages as agent-requested rule references, or add thin .cursor/rules/*.mdc stubs that link to the matching markdown export.
  • Generic agents: fetch /llms.txt, /llms-full.txt, or the individual /docs-md/agent-skills/<skill> pages.

Core Operating Skills

  • agentclash-agent-harness-setup - harness: Use when creating, running, or ranking Agent Harness coding-agent tasks via the CLI, including harness specs, E2B runner kinds, suite task banks, executions, failure review, and promote-to-task flows.
  • agentclash-ci-release-gate - ci: Use when wiring AgentClash manifest-based CI gates, deciding whether a PR should run AgentClash, resolving baselines, running agentclash ci run, interpreting gate exit codes, collecting CI artifacts, or configuring regression promotion policy in GitHub Actions.
  • agentclash-cli-setup - setup: Use when configuring the AgentClash CLI, authenticating with device login or tokens, selecting a workspace, saving default config with link, creating project config with init, resolving API URL precedence, or diagnosing CLI access against production, local, or self-hosted backends.
  • agentclash-compare-and-triage - comparison: Use when comparing baseline vs candidate AgentClash runs, evaluating release gates, managing workspace baseline bookmarks, or building a replay triage envelope after an eval completes.
  • agentclash-dataset-workflows - dataset: Use when managing AgentClash datasets via CLI — create versions, import/export examples, run evals, CI gates, synthetic generation, trace import, candidate review, and regression suite sync.
  • agentclash-eval-runner - running: Use when starting, following, inspecting, or reporting AgentClash eval runs with the CLI, especially eval start, run create, deployment selection, input set selection, suite-only scopes, repetitions, events, rankings, failures, and scorecards.
  • agentclash-hub - hub: Use when starting any AgentClash eval, CLI, or challenge-pack task. Load this skill first for the full workflow map, skill dependency order, product UI links, hosted defaults, and pointers to every other AgentClash skill.
  • agentclash-multi-turn-operator - multi-turn: Use when a multi_turn challenge pack run agent is awaiting human input and you need to check turn status or submit an operator message with agentclash run turn.
  • agentclash-prompt-eval-playground - prompt-eval: Use when scaffolding, validating, or running prompt eval YAML configs and managing playground experiments, test cases, and prompt variants via the AgentClash CLI.
  • agentclash-quickstart - onboarding: Use when checking whether a workspace is ready to run AgentClash evals, interpreting quickstart readiness checks, or choosing the next CLI command after auth and workspace selection.
  • agentclash-regression-flywheel - regression: Use when inspecting AgentClash run failure-review items, promoting useful failures into regression suites, editing regression suites or cases, and verifying suite-only reruns.
  • agentclash-scorecard-reader - reviewing: Use when interpreting AgentClash rankings, scorecards, replay timelines, artifacts, LLM judge results, or failure-review evidence into source-backed findings and next actions.
  • agentclash-security-evaluation - security: Use when running client-side security stress harnesses against security challenge packs, measuring leak posture, Agent Vault routing, or HashiCorp Vault runtime leaks with agentclash security commands.
  • agentclash-workspace-admin - admin: Use when creating or administering AgentClash organizations and workspaces, inviting members, updating roles, or binding default workspace context beyond basic CLI login.

Challenge Pack Skills

Focused skills for planning, authoring, scoring, judging, tooling, artifacts, validation, and publishing challenge packs.

  • agentclash-challenge-pack-artifacts - challenge-pack-artifacts: Use when specifying AgentClash challenge pack assets, artifact references, produced file captures, evidence references, artifact upload/download expectations, and review-only evidence.
  • agentclash-challenge-pack-input-sets - challenge-pack-inputs: Use when designing AgentClash challenge pack cases and input sets for smoke, full benchmark, regression, edge-case, or CI suite-only coverage.
  • agentclash-challenge-pack-llm-judges - challenge-pack-judging: Use when configuring AgentClash LLM-as-judge scoring, judge prompts, rubrics, assertion/reference/n-wise modes, evidence inputs, scorecard dimensions, abstention behavior, and judge result interpretation.
  • agentclash-challenge-pack-planner - challenge-pack-planning: Use when turning a vague AgentClash evaluation idea into a source-backed challenge pack plan with task boundaries, target agents, cases, input sets, scoring strategy, tools, artifacts, runtime policy, validation criteria, and handoff steps.
  • agentclash-challenge-pack-scoring-validators - challenge-pack-scoring: Use when defining deterministic AgentClash scoring validators, scorecard dimensions, evidence sources, pass/fail rules, numeric metrics, file checks, and validator result interpretation.
  • agentclash-challenge-pack-tools-sandbox - challenge-pack-tools: Use when defining AgentClash challenge pack tool access, sandbox runtime needs, filesystem expectations, network policy, command execution, and secret references.
  • agentclash-challenge-pack-validation-publish - challenge-pack-publication: Use when validating AgentClash challenge pack YAML, fixing schema/scoring/tool/asset errors, publishing runnable pack versions, recording returned IDs, and preparing next eval commands.
  • agentclash-challenge-pack-yaml-author - challenge-pack-authoring: Use when writing or editing AgentClash challenge pack YAML, including pack/version metadata, execution mode, challenges, cases, input sets, scoring blocks, tools, sandbox settings, assets, and validation handoff.

Agent Build Skills

Focused skills for agent build specs, deployments, runtime resources, provider accounts, model aliases, and secrets.

  • agentclash-agent-build-author - agent-builds: Use when creating, editing, validating, or readying AgentClash agent builds and build versions, including agent identity, spec JSON, prompts, model/runtime expectations, tool bindings, and version readiness.
  • agentclash-agent-deployment-setup - agent-deployments: Use when creating, selecting, or diagnosing AgentClash agent deployments for runs, including ready build versions, runtime profiles, provider/model wiring, deployment IDs, workspace context, and run compatibility.
  • agentclash-runtime-resources-setup - runtime-resources: Use when configuring AgentClash workspace secrets, provider accounts, model catalog entries, model aliases, runtime profiles, workspace tools, and readiness checks required before agent builds, deployments, evals, or runs.

Canonical Layout

text
web/content/agent-skills/<category-or-skill>/.../SKILL.md

Each skill keeps the main instructions focused and uses trigger-oriented frontmatter so agents can discover the right workflow before loading the full body.

Catalog Contract

The root catalog skill is the authoring and review contract for all AgentClash skills.

markdown
1---
2name: agentclash-skill-catalog
3description: Use when creating, reviewing, or updating AgentClash agent-skill folders so the catalog taxonomy, frontmatter, generated docs, markdown exports, and llms.txt surfaces stay consistent.
4metadata:
5  agentclash.role: catalog
6  agentclash.version: "1"
7  agentclash.requires_cli: "false"
8---
9
10# AgentClash Skill Catalog
11
12## Purpose
13Define the folder taxonomy and publishing contract for AgentClash Agent Skills. Use this skill before adding or changing any `web/content/agent-skills/**/SKILL.md` file.
14
15## Use When
16- A user asks to add a new AgentClash skill.
17- A user asks to update the skill catalog, taxonomy, metadata, generated docs, markdown exports, or `llms.txt` coverage.
18- A reviewer needs to check whether a skill can be copied into Codex, Claude Code, Cursor, or another coding-agent workflow.
19
20## Do Not Use When
21- The task is only to run an eval, read a scorecard, configure the CLI, or author a challenge pack.
22- The task changes product docs outside the Agent Skills catalog and does not affect skill discovery.
23
24## Inputs Needed
25- Feature area and intended user workflow.
26- The upstream skill dependencies that must be read first.
27- Source-backed command names, field names, examples, and failure modes.
28- Whether the workflow targets hosted production, local development, or a self-hosted backend.
29
30## Canonical Folder Taxonomy
31The canonical source is always a `SKILL.md` file under `web/content/agent-skills`.
32
33```text
34web/content/agent-skills/SKILL.md
35web/content/agent-skills/<top-level-skill>/SKILL.md
36web/content/agent-skills/agent-build-skills/<skill>/SKILL.md
37web/content/agent-skills/challenge-pack-skills/<skill>/SKILL.md
38```
39
40Use top-level folders for cross-cutting workflows such as CLI setup, eval running, scorecard reading, regression, and CI gates. Use `agent-build-skills/` for agent build specs, runtime resources, deployments, providers, secrets, and model aliases. Use `challenge-pack-skills/` for challenge pack planning, YAML authoring, inputs, tools, artifacts, scoring, judges, validation, and publish workflows.
41
42## Required Frontmatter
43Every skill must start with YAML frontmatter that the docs generator can parse with `gray-matter`.
44
45```yaml
46---
47name: agentclash-example-skill
48description: Use when the trigger is specific enough that an agent can choose this skill before reading the body.
49metadata:
50  agentclash.role: example
51  agentclash.version: "1"
52  agentclash.requires_cli: "true"
53---
54```
55
56Field rules:
57- `name`: stable kebab-case skill identifier, usually matching the folder name.
58- `description`: trigger-oriented sentence that starts with "Use when" and names the workflow, not a generic summary.
59- `metadata.agentclash.role`: short role label shown in the catalog list.
60- `metadata.agentclash.version`: string version for the skill contract.
61- `metadata.agentclash.requires_cli`: string `"true"` or `"false"` so installers and reviewers can spot CLI-dependent skills.
62
63## Required Body Sections
64Each skill should be useful without reading the AgentClash source code. Include these sections unless the section is not applicable and explicitly say why.
65
66- `Purpose`: one paragraph describing the workflow outcome.
67- `Use When`: concrete triggers that should load the skill.
68- `Do Not Use When`: nearby workflows that should choose a different skill.
69- `Inputs Needed`: information the agent should collect before acting.
70- `Environment`: backend defaults, credentials, workspace, and local/self-hosted differences.
71- `Procedure`: ordered operating steps.
72- `Commands`: copyable commands with placeholders.
73- `Expected Output`: what success looks like.
74- `Failure Modes`: common errors and recovery steps.
75- `Safety Notes`: secrets, destructive actions, cost, publish, or production cautions.
76- `Report Back Format`: concise format the agent should use when done.
77- `Related Docs`: `/docs-md/...` links that support the workflow.
78
79## Generated Docs Contract
80The web docs generator discovers skill files from `web/content/agent-skills/**/SKILL.md`.
81
82- `/docs/agent-skills` and `/docs-md/agent-skills` render the catalog index and this catalog contract.
83- `/docs/agent-skills/<skill>` and `/docs-md/agent-skills/<skill>` render individual top-level skills.
84- `/docs/agent-skills/<category>/<skill>` and `/docs-md/agent-skills/<category>/<skill>` render nested category skills.
85- `/llms.txt` includes the Agent Skills entry and every discovered skill page.
86- `/llms-full.txt` includes the Agent Skills catalog, category pages, and full skill bodies.
87
88When adding a new category, update the docs navigation and category map in `web/src/lib/docs.ts` so the category page, markdown path, and bundle order are explicit.
89
90## Hosted Backend Examples
91Use hosted production by default:
92
93```bash
94export AGENTCLASH_API_URL="https://api.agentclash.dev"
95agentclash auth login --device
96agentclash workspace list
97agentclash workspace use <workspace-id>
98```
99
100Only use local or self-hosted URLs when the skill is explicitly about local development or deployment:
101
102```bash
103agentclash --api-url http://localhost:8080 doctor
104```
105
106## Authoring Procedure
1071. Pick the folder from the taxonomy.
1082. Read the related upstream skills in dependency order.
1093. Verify command names, config names, YAML fields, and API behavior from source-backed docs or code.
1104. Write trigger-oriented frontmatter and the required body sections.
1115. Prefer production hosted examples unless the workflow is local or self-hosted.
1126. Add or update docs-generation tests when a new path, category, or export behavior is introduced.
1137. Run the docs tests and lint before opening a PR.
114
115## Validation Commands
116```bash
117cd web
118npm test -- docs.test.ts
119npm run lint
120```
121
122For CLI packaging-related skill changes, also validate from `cli/`:
123
124```bash
125go build ./...
126go test -short -race -count=1 ./...
127```
128
129## Failure Modes
130- Missing frontmatter: the generated page may have an empty name or description.
131- Vague description: agents may not discover the skill at the right time.
132- New category without generator updates: `/docs-md/agent-skills/<category>` may not exist.
133- Examples that default to localhost: users may accidentally run against the wrong backend.
134- Claims not tied to current docs or code: downstream skills will repeat incorrect fields or commands.
135
136## Safety Notes
137- Do not include tokens, workspace secrets, or customer data in examples.
138- Do not tell agents to publish, delete, or mutate production resources without an explicit confirmation step.
139- Prefer read-only discovery commands before write commands.
140- Keep skill instructions portable; avoid relying on a single agent product unless the section is explicitly install-target guidance.
141
142## Dependency Order
143Read related skills in this order so downstream workflows do not redefine upstream concepts:
144
1451. `agentclash-hub`
1462. `agentclash-cli-setup`
1473. `agentclash-quickstart`
1484. `agentclash-runtime-resources-setup`
1495. `agentclash-agent-build-author`
1506. `agentclash-agent-deployment-setup`
1517. `agentclash-challenge-pack-planner`
1528. `agentclash-challenge-pack-yaml-author`
1539. `agentclash-challenge-pack-input-sets`
15410. `agentclash-challenge-pack-tools-sandbox`
15511. `agentclash-challenge-pack-artifacts`
15612. `agentclash-challenge-pack-scoring-validators`
15713. `agentclash-challenge-pack-llm-judges`
15814. `agentclash-challenge-pack-validation-publish`
15915. `agentclash-eval-runner`
16016. `agentclash-scorecard-reader`
16117. `agentclash-compare-and-triage`
16218. `agentclash-regression-flywheel`
16319. `agentclash-ci-release-gate`
16420. `agentclash-agent-harness-setup`
16521. `agentclash-multi-turn-operator`
16622. `agentclash-dataset-workflows`
16723. `agentclash-prompt-eval-playground`
16824. `agentclash-workspace-admin`
16925. `agentclash-security-evaluation`
170
171## Review Checklist
172- The folder path matches the taxonomy.
173- Frontmatter includes `name`, trigger-oriented `description`, and the three `metadata.agentclash.*` fields.
174- Commands and examples use `https://api.agentclash.dev` unless local or self-hosted behavior is explicit.
175- The body includes inputs, exact fields or commands, expected outputs, failure modes, safety notes, and report-back format.
176- Related docs use `/docs-md/...` links.
177- Generated docs, markdown exports, `llms.txt`, and `llms-full.txt` include the updated content.
178
179## Report Back Format
180```text
181Skill: <name>
182Path: web/content/agent-skills/<path>/SKILL.md
183Category: <core | agent-build-skills | challenge-pack-skills>
184Docs: /docs-md/agent-skills/<path>
185Validation: <commands run and result>
186Notes: <source-backed caveats or follow-ups>
187```
188
189## Related Docs
190- `/docs-md/agent-skills`
191- `/docs-md/guides/use-with-ai-tools`
192- `/docs-md/reference/cli`
193- `/docs-md/reference/config`