Agent Skills

Agent Harness Setup Skill

Use when creating, running, or ranking Agent Harness coding-agent tasks via the CLI, including harness specs, E2B runner kinds, suite task banks, executions, failure review, and promote-to-task flows.

Canonical source: web/content/agent-skills/agentclash-agent-harness-setup/SKILL.md

Markdown export: /docs-md/agent-skills/agentclash-agent-harness-setup

Use This Skill When

Use when creating, running, or ranking Agent Harness coding-agent tasks via the CLI, including harness specs, E2B runner kinds, suite task banks, executions, failure review, and promote-to-task flows.

Full SKILL.md

markdown
1---
2name: agentclash-agent-harness-setup
3description: Use when creating, running, or ranking Agent Harness coding-agent tasks via the CLI, including harness specs, E2B runner kinds, suite task banks, executions, failure review, and promote-to-task flows.
4metadata:
5  agentclash.role: harness
6  agentclash.version: "1"
7  agentclash.requires_cli: "true"
8---
9
10# AgentClash Agent Harness Setup
11
12## Purpose
13Configure and operate Agent Harnesses — workspace-scoped autonomous coding tasks with E2B runners, evaluation config, suite rankings, and failure curation. Agent Harnesses are not challenge packs.
14
15## Use When
16- A user wants long-running coding-agent checks against a repository with validators or LLM judges.
17- You need to create harnesses for Codex, Claude, Hermes, or OpenClaw runners on E2B.
18- The workflow involves suite task banks, multi-harness suite runs, or promoting failed executions into private tasks.
19
20## Do Not Use When
21- The workload is a standard challenge-pack eval — use `agentclash-eval-runner`.
22- The user only needs deployments or runtime resources — use agent-build skills first.
23- The task is prompt A/B testing without a repo harness — use `agentclash-prompt-eval-playground`.
24
25## Inputs Needed
26- Workspace with provider secrets configured (`openai_api_key_secret_name` or `--api-key-secret`).
27- Task prompt, repository URL, and harness kind.
28- Optional evaluation config JSON (validators, LLM judges).
29- For suite runs: suite ID, harness IDs, optional task filters.
30
31## Environment
32```bash
33export AGENTCLASH_API_URL="https://api.agentclash.dev"
34agentclash workspace use <WORKSPACE_ID>
35agentclash secret list --json
36```
37
38## Procedure
391. List or inspect existing harnesses.
402. Create a harness with `--name`, `--task`, `--auth-mode`, and API key secret.
413. Run a harness execution; use `--follow` to poll to terminal status.
424. Inspect executions, failure summaries, and failure reviews.
435. Optionally create suites, run across harnesses, read rankings, promote executions to private tasks.
44
45## Commands
46
47### Harness CRUD and runs
48```bash
49agentclash agent-harness list
50agentclash agent-harness get <harness-id>
51agentclash agent-harness create \
52  --name "Refund fix" \
53  --task "Fix the refund bug in services/refund.go" \
54  --harness-kind codex_e2b \
55  --auth-mode api_key_secret \
56  --api-key-secret OPENAI_API_KEY \
57  --repository-url https://github.com/org/repo \
58  --base-branch main
59
60agentclash agent-harness run <harness-id> --follow
61agentclash agent-harness executions <harness-id>
62```
63
64`--harness-kind` values: `codex_e2b` (default), `claude_e2b`, `hermes_e2b`, `openclaw_e2b`.
65
66Create from JSON:
67```bash
68agentclash agent-harness create --from-file harness.json
69```
70
71Optional flags: `--codex-template`, `--codex-model`, `--execution-config`, `--evaluation-config`, `--evaluation-config-file`.
72
73### Executions
74```bash
75agentclash agent-harness execution get <execution-id>
76agentclash agent-harness execution cancel <execution-id>
77agentclash agent-harness execution retry <execution-id> --idempotency-key cli-retry-1
78```
79
80### Suites and rankings
81```bash
82agentclash agent-harness suite list
83agentclash agent-harness suite create --name "Private bank" --task-json '{"title":"Task 1","public_prompt":"..."}'
84agentclash agent-harness suite tasks <suite-id>
85agentclash agent-harness suite run <suite-id> --harness <harness-id-1> --harness <harness-id-2>
86agentclash agent-harness suite rankings <suite-id> --k 3
87```
88
89### Failures and promotion
90```bash
91agentclash agent-harness failures summary
92agentclash agent-harness execution failure-review get <execution-id>
93agentclash agent-harness execution failure-review update <execution-id> --human-class timeout --human-summary "Sandbox timed out"
94agentclash agent-harness execution promote-task <execution-id> --suite <suite-id> --title "Promoted failure case"
95```
96
97Alias: `agentclash harness` = `agentclash agent-harness`.
98
99## Expected Output
100- Create returns harness ID, kind, auth mode, template.
101- Run returns execution ID; `--follow` polls until `completed`, `failed`, or `cancelled`.
102- Suite rankings table shows success@1, pass@k, cost, latency per harness.
103
104## Failure Modes
105- Missing `--api-key-secret` on create → pass workspace secret name containing provider key.
106- Missing required suite flags → `--harness` required on `suite run`; `--task-json` or `--from-file` on suite create.
107- Non-terminal execution on retry → only terminal executions can retry.
108- Wrong harness kind for template → defaults: `codex`, `agentclash-claude-fullstack`, `agentclash-hermes-fullstack`, `agentclash-openclaw-fullstack`.
109
110## Safety Notes
111- Harness runs execute code in E2B sandboxes with repository access — confirm repo and secrets before running.
112- Promoted tasks may contain sensitive failure excerpts — sanitize `public_prompt`.
113- Do not paste API keys or secret values into chat.
114
115## Report Back Format
116```text
117Harness: <id> (<name>, <kind>)
118Execution: <id> — <status>
119Suite: <id or n/a>
120Ranking summary: <top harness or n/a>
121Failure review: <effective_class or n/a>
122Next commands: <1-3>
123```
124
125## Related Skills
126- `agentclash-hub`
127- `agentclash-cli-setup`
128- `agentclash-runtime-resources-setup`
129- `agentclash-eval-runner`
130- `agentclash-scorecard-reader`
131- `agentclash-regression-flywheel`
132
133## Related Docs
134- `/docs-md/guides/ci-cd-workload-recipes`
135- `/docs-md/reference/cli`