Agent Skills

Challenge Pack Validation Publish Skill

Use when validating AgentClash challenge pack YAML, fixing schema/scoring/tool/asset errors, publishing runnable pack versions, recording returned IDs, and preparing next eval commands.

Canonical source: web/content/agent-skills/challenge-pack-skills/agentclash-challenge-pack-validation-publish/SKILL.md

Markdown export: /docs-md/agent-skills/challenge-pack-skills/agentclash-challenge-pack-validation-publish

Use This Skill When

Use when validating AgentClash challenge pack YAML, fixing schema/scoring/tool/asset errors, publishing runnable pack versions, recording returned IDs, and preparing next eval commands.

Full SKILL.md

markdown
1---
2name: agentclash-challenge-pack-validation-publish
3description: Use when validating AgentClash challenge pack YAML, fixing schema/scoring/tool/asset errors, publishing runnable pack versions, recording returned IDs, and preparing next eval commands.
4metadata:
5  agentclash.role: challenge-pack-publication
6  agentclash.version: "1"
7  agentclash.requires_cli: "true"
8---
9
10# AgentClash Challenge Pack Validation and Publish
11
12## Purpose
13Validate a complete AgentClash challenge pack against the real workspace-backed parser, fix any source-reported errors, publish a runnable version only when intended, and report the IDs needed to start an eval.
14
15## Use When
16- A challenge pack YAML file is ready for validation after planning, YAML authoring, inputs, tools, artifacts, scoring, and judges are in place.
17- The user asks whether a pack is publishable.
18- The user asks to publish a pack and needs the returned pack/version/evaluation/input-set IDs.
19- A reviewer needs exact next commands for `agentclash eval start` or `agentclash run create`.
20
21## Do Not Use When
22- The pack idea is still being designed; use `agentclash-challenge-pack-planner`.
23- The YAML structure is being written from scratch; use `agentclash-challenge-pack-yaml-author`.
24- The task is only to run an already published eval; use `agentclash-eval-runner`.
25- The task is to upload standalone workspace artifacts; use `agentclash-challenge-pack-artifacts`.
26
27## Inputs Needed
28- Pack YAML path.
29- Target workspace ID or a configured workspace from `agentclash link`, `agentclash workspace use`, `--workspace`, `AGENTCLASH_WORKSPACE`, or `.agentclash.yaml`.
30- Whether publish is explicitly allowed now.
31- Deployment ID/name for the next run command, if the user wants a ready-to-run command.
32
33## Environment
34Use hosted production by default unless the user intentionally targets local or self-hosted infrastructure:
35
36```bash
37export AGENTCLASH_API_URL="https://api.agentclash.dev"
38```
39
40`agentclash challenge-pack validate` is not an offline-only local parser in the current CLI. It reads the YAML file, requires auth and a workspace, then posts the raw bundle to:
41
42```text
43POST /v1/workspaces/<workspace-id>/challenge-packs/validate
44```
45
46`agentclash challenge-pack publish` posts the same raw bundle to:
47
48```text
49POST /v1/workspaces/<workspace-id>/challenge-packs
50```
51
52Use `--json` for coding-agent workflows because human output intentionally omits several IDs.
53
54## CLI Surface
55Use the full command names in docs and automation. `challenge-pack` also has the `cp` alias, but the full form is clearer for agents.
56
57```bash
58agentclash challenge-pack validate path/to/pack.yaml --json
59agentclash challenge-pack publish path/to/pack.yaml --json
60agentclash challenge-pack list --json
61```
62
63Exact command notes from the CLI:
64
65- `validate` and `publish` each take exactly one file path.
66- `validate` and `publish` have no command-specific flags today; `--json`, `--workspace`, `--api-url`, and `--quiet` are global flags.
67- `publish` does not implement a challenge-pack-specific dry run, confirmation flag, or update flag.
68- `list --json` returns visible packs for the current workspace, with each pack's runnable `versions`.
69- The API request body is capped at 1 MiB.
70
71## Validation Procedure
721. Set hosted API URL unless the workflow is explicitly local or self-hosted.
732. Verify the workspace context before making workspace-backed calls.
743. Run validation with `--json`.
754. If the command fails, inspect both stdout and stderr. Current validation failures use HTTP 400, so the CLI can report the response body through its generic API-error path instead of returning a clean JSON object on stdout.
765. Fix every `field` and `message` pair reported by the API.
776. Re-run validation until it succeeds.
787. Only then publish, and only when the user has already approved publishing.
79
80```bash
81export AGENTCLASH_API_URL="https://api.agentclash.dev"
82agentclash auth status
83agentclash workspace use <WORKSPACE_ID>
84agentclash challenge-pack validate path/to/pack.yaml --json
85```
86
87On success, `validate --json` returns:
88
89```json
90{
91  "valid": true,
92  "errors": null
93}
94```
95
96On validation failure, the API response shape is:
97
98```json
99{
100  "valid": false,
101  "errors": [
102    {
103      "field": "version.evaluation_spec.validators[0].type",
104      "message": "must be one of ..."
105    }
106  ]
107}
108```
109
110Because invalid validation is returned with HTTP 400 today, do not assume a failing CLI invocation will print that object as successful structured stdout. Still use its `field` and `message` details when they are present.
111
112## What Validation Checks
113Validation parses the YAML bundle, normalizes legacy case data, runs bundle validation, validates scoring spec fields, and checks stored artifact IDs against the selected workspace.
114
115Pack and version checks:
116
117- `pack.slug`, `pack.name`, and `pack.family` are required.
118- `version.number` must be greater than 0.
119- `version.execution_mode` can be omitted, `native`, or `prompt_eval`.
120- `prompt_eval` packs must not include top-level `tools`, `version.sandbox`, or `version.tool_policy`.
121- At least one challenge is required.
122
123Challenge checks:
124
125- Every challenge needs `key`, `title`, `category`, and `difficulty`.
126- `difficulty` must be exactly `easy`, `medium`, `hard`, or `expert`.
127- Challenge keys must be unique.
128- `artifact_refs[].key` must reference a declared `version.assets[].key`.
129
130Input set checks:
131
132- Every input set needs `key`, `name`, and at least one `cases` entry.
133- Input set keys must be unique.
134- All cases in the same input set must reference the same `challenge_key`.
135- `challenge_key` must reference a declared challenge.
136- `case_key` is the current field; legacy `item_key` is normalized but should not be newly authored.
137- Case keys must be unique per challenge inside the input set.
138- Case `inputs[].key`, `expectations[].key`, and local asset keys must be unique where they appear.
139- `inputs[].artifact_key`, `expectations[].artifact_key`, and `source: artifact:<key>` must reference `version.assets`.
140- `expectations[].source` may use `input:<input_key>` or `artifact:<asset_key>`; other prefixes are rejected.
141
142Tools, sandbox, and artifact checks:
143
144- `version.tool_policy.allowed_tool_kinds` must be an array containing only `browser`, `build`, `data`, `file`, and `network`.
145- Custom composed tools must have a valid `implementation.primitive`, `implementation.args`, and parameter schema.
146- Tool delegation cycles and delegation chains deeper than 8 are rejected.
147- `version.sandbox.network_allowlist[]` entries must be valid CIDR strings.
148- `version.sandbox.env_vars` keys must match `[A-Za-z_][A-Za-z0-9_]*`.
149- `version.sandbox.additional_packages[]` values must look like apt package names.
150- Each asset needs `key` and `path`; duplicate asset keys in the same list are rejected.
151- If an asset includes `artifact_id`, validation checks that the artifact exists and belongs to the selected workspace.
152
153Scoring checks:
154
155- Errors from `version.evaluation_spec` are prefixed with `version.` in challenge-pack validation output.
156- Strict evaluation-spec decoding catches unknown fields before scoring validation runs.
157- Keep deterministic, LLM judge, and hybrid scoring mode rules aligned with `agentclash-challenge-pack-scoring-validators` and `agentclash-challenge-pack-llm-judges`.
158
159## Fix Patterns
160- `pack.family is required`: set a stable family string, usually the pack slug or product/workload family.
161- `version.execution_mode must be one of "native", "prompt_eval"`: use exactly `native` or `prompt_eval`, or omit the field.
162- `tools must be empty when version.execution_mode is prompt_eval`: switch to `native` or remove top-level `tools`.
163- `version.tool_policy.allowed_tool_kinds[...] must be one of browser, build, data, file, network`: remove unsupported kinds such as `shell`.
164- `input_sets[...].cases[...].challenge_key must reference the same challenge as the other cases in this input set`: split cases by challenge into separate input sets.
165- `case_key is required`: add `case_key`; do not author new `item_key` entries.
166- `expectations[...].source must start with input: or artifact:`: use `input:<input_key>` or `artifact:<version_asset_key>`.
167- `artifact_id must reference an existing artifact`: upload or find the artifact first with the artifact CLI, then use its UUID.
168- `artifact_id must belong to the workspace`: switch workspace or use an artifact from the selected workspace.
169- `decode evaluation spec from yaml`: look for a misspelled or unsupported scoring field; this may surface as a CLI/API error rather than a normal `errors[]` item.
170
171## Publish Procedure
172Publish is a workspace mutation that creates a runnable challenge-pack version. Do it only after validation passes and the user has approved publishing.
173
174```bash
175agentclash challenge-pack validate path/to/pack.yaml --json
176agentclash challenge-pack publish path/to/pack.yaml --json
177```
178
179On success, `publish --json` returns:
180
181```json
182{
183  "challenge_pack_id": "<CHALLENGE_PACK_ID>",
184  "challenge_pack_version_id": "<CHALLENGE_PACK_VERSION_ID>",
185  "evaluation_spec_id": "<EVALUATION_SPEC_ID>",
186  "input_set_ids": ["<CHALLENGE_INPUT_SET_ID>"],
187  "bundle_artifact_id": "<BUNDLE_ARTIFACT_ID>"
188}
189```
190
191`bundle_artifact_id` is optional. It is present when the backend stores the authored YAML bundle as a workspace artifact.
192
193Without `--json`, the current CLI only prints the pack ID and version ID. Always use `--json` when you need `evaluation_spec_id`, `input_set_ids`, or `bundle_artifact_id`.
194
195Publish stores:
196
197- a workspace-scoped challenge pack, keyed by `pack.slug`;
198- a runnable challenge pack version using `version.number`;
199- one evaluation spec from `version.evaluation_spec`;
200- one challenge input set row for each authored `input_sets[]` entry;
201- one challenge input item row for each case;
202- optionally, a `challenge_pack_bundle` artifact for the source YAML.
203
204## Publish Failure Modes
205- `challenge_pack_version_exists`: the same pack already has that `version.number`; increment `version.number` or intentionally target a different pack slug.
206- `challenge_pack_metadata_conflict`: an existing pack with the same workspace and slug has a different `pack.name` or `pack.family`; keep metadata stable or choose a new slug.
207- Billing or entitlement errors: the workspace cannot publish private challenge packs under its current plan.
208- Authorization errors: the current caller cannot publish to the selected workspace.
209- Validation errors: publish re-runs the same bundle and stored-artifact checks as validate.
210- Oversized bundle errors: keep the YAML body at or below the current 1 MiB request limit.
211
212## Next Eval Commands
213Record the IDs from `publish --json` before starting a run.
214
215Workflow-first command, with selectors accepted by the CLI:
216
217```bash
218agentclash eval start \
219  --pack <PACK_ID_OR_SLUG_OR_EXACT_NAME> \
220  --pack-version <VERSION_ID_OR_VERSION_NUMBER> \
221  --input-set <INPUT_SET_ID_OR_KEY_OR_EXACT_NAME> \
222  --deployment <DEPLOYMENT_ID_OR_EXACT_NAME> \
223  --follow
224```
225
226Lower-level non-interactive command, using IDs:
227
228```bash
229agentclash run create \
230  --challenge-pack-version <CHALLENGE_PACK_VERSION_ID> \
231  --input-set <CHALLENGE_INPUT_SET_ID> \
232  --deployments <AGENT_DEPLOYMENT_ID> \
233  --follow
234```
235
236When the published version has multiple input sets, pass `--input-set` in non-interactive workflows. `agentclash eval start` can resolve an input set by ID, key, or exact name; `agentclash run create` expects the input set ID.
237
238## Safety Notes
239- Do not publish unless the user has already approved the workspace mutation.
240- Do not include raw secret values in pack YAML, assets, examples, comments, or reports.
241- `publish` does not upload local files referenced by `path`; use the artifact CLI first when a stored `artifact_id` is required.
242- Prefer small smoke input sets before publishing large benchmark suites.
243- Keep `pack.slug`, `pack.family`, challenge keys, input set keys, and case keys stable after publish so scorecards, regressions, and CI gates remain comparable.
244- Avoid publishing customer-sensitive fixtures unless retention and workspace access are approved.
245
246## Report Back Format
247```text
248Validation: <pass/fail>
249Workspace: <workspace-id or configured default>
250Pack file: <path>
251Errors fixed:
252- <field>: <message/fix>
253Published: <yes/no>
254Challenge pack ID: <id>
255Challenge pack version ID: <id>
256Evaluation spec ID: <id>
257Input set IDs:
258- <id> (<key/name if known>)
259Bundle artifact ID: <id or not returned>
260Next eval command:
261Next run command:
262Notes: <conflicts, entitlement/auth caveats, skipped publish reason>
263```
264
265## Related Skills
266- `agentclash-cli-setup`
267- `agentclash-challenge-pack-planner`
268- `agentclash-challenge-pack-yaml-author`
269- `agentclash-challenge-pack-input-sets`
270- `agentclash-challenge-pack-tools-sandbox`
271- `agentclash-challenge-pack-artifacts`
272- `agentclash-challenge-pack-scoring-validators`
273- `agentclash-challenge-pack-llm-judges`
274- `agentclash-eval-runner`
275- `agentclash-security-evaluation` — when the pack includes a `security` policy block
276
277## Related Docs
278- `/docs-md/agent-skills/challenge-pack-skills/agentclash-challenge-pack-yaml-author`
279- `/docs-md/agent-skills/challenge-pack-skills/agentclash-challenge-pack-artifacts`
280- `/docs-md/agent-skills/agentclash-eval-runner`