Agent Skills
Challenge Pack Artifacts Skill

Use when specifying AgentClash challenge pack assets, artifact references, produced file captures, evidence references, artifact upload/download expectations, and review-only evidence.
Canonical source: web/content/agent-skills/challenge-pack-skills/agentclash-challenge-pack-artifacts/SKILL.md
Markdown export: /docs-md/agent-skills/challenge-pack-skills/agentclash-challenge-pack-artifacts
Use This Skill When

Use when specifying AgentClash challenge pack assets, artifact references, produced file captures, evidence references, artifact upload/download expectations, and review-only evidence.
Full SKILL.md

markdown
1---
2name: agentclash-challenge-pack-artifacts
3description: Use when specifying AgentClash challenge pack assets, artifact references, produced file captures, evidence references, artifact upload/download expectations, and review-only evidence.
4metadata:
5  agentclash.role: challenge-pack-artifacts
6  agentclash.version: "1"
7  agentclash.requires_cli: "true"
8---
9
10# AgentClash Challenge Pack Artifacts
11
12## Purpose
13Make challenge pack files and evidence explicit without confusing three different surfaces:
14
15- source assets declared in challenge-pack YAML
16- case artifact references exposed to scoring evidence
17- files produced by the agent and captured after execution
18
19Use this skill when a coding agent needs exact artifact fields and evidence references without reading the AgentClash source repo.
20
21## Use When
22- A pack includes fixture files, expected reports, images, logs, datasets, or other stored assets.
23- Cases need `inputs[].artifact_key`, `expectations[].artifact_key`, `artifact_refs`, or case `artifacts`.
24- Scoring needs `artifact...` or `file:...` evidence references.
25- Native execution should capture files from the sandbox through `post_execution_checks`.
26- A reviewer needs to inspect artifacts after a run.
27
28## Do Not Use When
29- The pack only needs plain text final-output scoring; use `agentclash-challenge-pack-scoring-validators`.
30- The task is selecting input sets or regression suites; use `agentclash-challenge-pack-input-sets` or `agentclash-regression-flywheel`.
31- The task is configuring tools, sandbox network, or package access; use `agentclash-challenge-pack-tools-sandbox`.
32
33## Environment
34Use hosted production for CLI examples unless the user intentionally targets a local or self-hosted backend.
35
36```bash
37export AGENTCLASH_API_URL="https://api.agentclash.dev"
38```
39
40`agentclash challenge-pack validate` is a hosted API call: authenticate first and select a workspace with `agentclash link`, `--workspace`, `AGENTCLASH_WORKSPACE`, or local `.agentclash.yaml`. Use `agentclash-cli-setup` if the machine is not already logged in.
41
42## Validation Commands
43Validate after changing any asset, artifact reference, input artifact key, expectation artifact key, or file evidence target.
44
45```bash
46agentclash challenge-pack validate path/to/pack.yaml
47agentclash challenge-pack validate path/to/pack.yaml --json
48```
49
50Human output prints `Challenge pack is valid` or `Challenge pack has errors`. Use `--json` for structured `valid` and `errors` fields.
51
52## Asset Declaration Shape
53Challenge-pack assets use the same field shape at `version.assets`, `challenges[].assets`, and `input_sets[].cases[].assets`.
54
55```yaml
56version:
57  assets:
58    - key: policy_pdf
59      kind: document
60      path: assets/policy.pdf
61      media_type: application/pdf
62      artifact_id: 00000000-0000-0000-0000-000000000000
63
64challenges:
65  - key: summarize-policy
66    title: Summarize Policy
67    category: document
68    difficulty: medium
69    assets:
70      - key: challenge_notes
71        path: assets/challenge-notes.md
72
73input_sets:
74  - key: policy-smoke
75    name: Policy Smoke
76    cases:
77      - challenge_key: summarize-policy
78        case_key: summary-basic
79        assets:
80          - key: case_notes
81            path: assets/cases/summary-basic-notes.md
82```
83
84Source-backed fields:
85
86- `key`: required and unique inside that one `assets` list.
87- `path`: required, even when `artifact_id` is present.
88- `kind`: optional string.
89- `media_type`: optional string.
90- `artifact_id`: optional UUID for an already stored artifact.
91
92The current validator does not enforce enums for `kind` or `media_type`; use clear, conventional values and validate the pack.
93
94## Version Assets Versus Local Assets
95Only `version.assets` form the reference pool for artifact references.
96
97These fields must reference a declared `version.assets[].key`:
98
99- `challenges[].artifact_refs[].key`
100- `input_sets[].cases[].artifacts[].key`
101- `input_sets[].cases[].inputs[].artifact_key`
102- `input_sets[].cases[].expectations[].artifact_key`
103- `input_sets[].cases[].expectations[].source: artifact:<key>`
104
105Challenge-level and case-level `assets` are validated for `key` and `path`, but they are not accepted as targets for `artifact_refs`, `artifacts`, `artifact_key`, or `source: artifact:...`. If a file needs to be referenced by scoring or case evidence, declare it in `version.assets`.
106
107## Artifact References
108Use `artifact_refs` on a challenge to say which version assets matter to that challenge. Use case `artifacts` to expose version assets as artifact evidence for a case.
109
110```yaml
111version:
112  assets:
113    - key: policy_pdf
114      path: assets/policy.pdf
115      media_type: application/pdf
116    - key: expected_summary
117      path: assets/expected-summary.json
118      media_type: application/json
119
120challenges:
121  - key: summarize-policy
122    title: Summarize Policy
123    category: document
124    difficulty: medium
125    artifact_refs:
126      - key: policy_pdf
127
128input_sets:
129  - key: policy-smoke
130    name: Policy Smoke
131    cases:
132      - challenge_key: summarize-policy
133        case_key: summary-basic
134        artifacts:
135          - key: policy_pdf
136          - key: expected_summary
137        inputs:
138          - key: prompt
139            kind: text
140            value: Summarize the refund policy.
141          - key: source_document
142            kind: file
143            artifact_key: policy_pdf
144            path: assets/policy.pdf
145        expectations:
146          - key: summary_reference
147            kind: json
148            artifact_key: expected_summary
149          - key: summary_reference_via_source
150            kind: json
151            source: artifact:expected_summary
152```
153
154Hard validation rules:
155
156- `artifact_refs[].key` is required, unique in that `artifact_refs` list, and must reference `version.assets`.
157- Case `artifacts[].key` is required, unique in that case `artifacts` list, and must reference `version.assets`.
158- `inputs[].artifact_key` and `expectations[].artifact_key` must reference `version.assets`.
159- `expectations[].source` may be empty, `input:<case-input-key>`, or `artifact:<version-asset-key>`.
160
161## Evidence References
162Scoring references case and artifact evidence through supported evidence-reference strings.
163
164Artifact evidence:
165
166- Supported prefix shape: `artifact.<path>`.
167- Do not read `<path>` as a filesystem path here. The concrete shape is `artifact.<artifact_key>[.<field>]`, where the first segment after `artifact.` is the case artifact key, for example `artifact.expected_summary`.
168- Use `artifact.expected_summary.path` for the asset path.
169- Use `artifact.expected_summary.media_type`, `artifact.expected_summary.kind`, or `artifact.expected_summary.key` when the metadata is what the validator or judge needs.
170
171Case evidence:
172
173- `case.payload`
174- `case.payload.<field>`
175- `case.inputs.<input_key>`
176- `case.expectations.<expectation_key>`
177
178Produced file evidence:
179
180- `file:<post_execution_check_key>`
181
182Use `literal:<value>` for inline expected values when a validator requires `expected_from`.
183
184## Produced File Capture
185Agent-produced files are not declared with pack `assets`. Capture them after native execution through `version.evaluation_spec.post_execution_checks`.
186
187```yaml
188version:
189  execution_mode: native
190  tool_policy:
191    allowed_tool_kinds:
192      - file
193      - build
194  evaluation_spec:
195    name: policy-artifact-eval
196    version_number: 1
197    judge_mode: deterministic
198    post_execution_checks:
199      - key: generated_summary
200        type: file_capture
201        path: /workspace/summary.json
202        max_size_bytes: 1048576
203      - key: project_listing
204        type: directory_listing
205        path: /workspace
206        recursive: true
207    validators:
208      - key: generated_summary_exists
209        type: file_exists
210        target: file:generated_summary
211      - key: generated_summary_matches_schema
212        type: file_json_schema
213        target: file:generated_summary
214        config:
215          schema:
216            type: object
217    scorecard:
218      dimensions:
219        - key: correctness
220          source: validators
221          weight: 1
222```
223
224Source-backed `post_execution_checks` fields:
225
226- `key`: required and unique.
227- `type`: required and must be `file_capture` or `directory_listing`.
228- `path`: required.
229- `recursive`: optional boolean, useful for `directory_listing`.
230- `max_size_bytes`: optional integer; must be greater than or equal to `0`.
231
232File validators must target `file:<post_execution_check_key>`. `code_execution` validators must reference a `file_capture`, not a `directory_listing`.
233
234## Review-Only Evidence Patterns
235Use review-only evidence when humans need debugging context but scoring should not depend on it.
236
237Good review-only patterns:
238
239- Add a `directory_listing` check such as `project_listing` so reviewers can see what the agent created.
240- Capture small logs with `file_capture` when they explain failures.
241- Upload external review material with the artifact CLI and attach metadata that explains the source.
242- Report artifact keys, paths, and IDs back to the user without claiming they were scored.
243
244Keep scored evidence and review evidence separate in the report. If a file should affect the score, reference it from a validator, judge `context_from`, or judge `reference_from`. If it is only context for humans, leave it unreferenced by scoring.
245
246## Artifact CLI Surface
247Workspace artifacts are managed by the CLI, but challenge-pack YAML does not have upload or download commands inside the bundle.
248
249```bash
250agentclash artifact list
251agentclash artifact list --json
252agentclash artifact upload path/to/file --type reference-data
253agentclash artifact upload path/to/file --type reference-data --metadata '{"purpose":"review"}' --json
254agentclash artifact download <ARTIFACT_ID> --output path/to/file
255```
256
257Exact command notes:
258
259- `agentclash artifact list` lists artifacts in the current workspace. It does not have a `--run` filter today.
260- `agentclash artifact upload <file>` requires `--type`.
261- Upload also accepts optional `--run <RUN_ID>`, `--run-agent <RUN_AGENT_ID>`, and `--metadata <JSON>`.
262- `agentclash artifact download <artifactId>` writes to stdout by default.
263- Use `--output` or `-O` to save a downloaded artifact to a file.
264
265Use these commands for workspace artifact inspection or preloading stored artifacts. Use `version.assets[].artifact_id` only when you already have a stored artifact UUID to reference.
266
267## Common Validation Failures
268- An asset omits `key` or `path`.
269- Duplicate `key` values appear inside one `assets`, `artifact_refs`, or case `artifacts` list.
270- `artifact_refs[].key` points at a challenge-local or case-local asset instead of `version.assets`.
271- Case `artifacts[].key`, `inputs[].artifact_key`, or `expectations[].artifact_key` points at an undeclared version asset.
272- `source: artifact:...` uses an empty key or a key not declared in `version.assets`.
273- A scoring validator targets `file:generated_summary` but no matching `post_execution_checks[].key` exists.
274- A file validator uses `artifact.expected_summary` instead of `file:<post_execution_check_key>`.
275- A `code_execution` validator targets a `directory_listing`.
276- The pack uses produced-output assets under `version.assets` instead of `post_execution_checks`.
277
278## Authoring Procedure
2791. List every static file the pack needs.
2802. Put any file that must be referenced by cases or scoring in `version.assets`.
2813. Add challenge-local or case-local `assets` only for local organization; do not rely on them for `artifact_key` references.
2824. Add `artifact_refs` to each challenge when a version asset belongs to that challenge.
2835. Add case `artifacts` when scoring or review evidence should expose a version asset on that case.
2846. Use `inputs[].artifact_key`, `expectations[].artifact_key`, or `source: artifact:<key>` only for declared version assets.
2857. For agent-produced files, use `version.evaluation_spec.post_execution_checks`, then target them with `file:<key>`.
2868. Decide which captured files are scored and which are review-only.
2879. Run `agentclash challenge-pack validate path/to/pack.yaml --json`.
28810. Report exact artifact keys, file paths, scored evidence references, review-only evidence, and validation result.
289
290## Safety
291- Do not include raw secret values in assets, captured files, artifact metadata, examples, chat, or commits.
292- Keep captured files small and deliberate; `file_capture` content becomes scoring evidence.
293- Avoid capturing broad directories unless a listing is needed for review.
294- Do not store private customer data as long-lived `version.assets` unless retention and access are approved.
295- Prefer deterministic fixture assets over live mutable files.
296
297## Report Back Format
298```text
299Version assets:
300- key:
301  path:
302  media_type:
303  artifact_id:
304Challenge artifact refs:
305Case artifacts:
306Input artifact keys:
307Expectation artifact keys/sources:
308Produced file captures:
309- key:
310  type:
311  path:
312  scored: <yes/no>
313Scoring evidence refs:
314Review-only evidence:
315Artifact CLI commands used:
316Validation command:
317Validation result:
318Open issues:
319```
320
321## Related Skills
322- `agentclash-challenge-pack-input-sets`
323- `agentclash-challenge-pack-tools-sandbox`
324- `agentclash-challenge-pack-scoring-validators`
325- `agentclash-challenge-pack-llm-judges`
326- `agentclash-challenge-pack-validation-publish`
327- `agentclash-scorecard-reader`