Guides
Write a Challenge Pack
Author a challenge-pack bundle in YAML, validate it against the current parser, and publish it into a workspace.
Goal: write a pack that the current AgentClash parser, validator, and publish flow will accept.
Prerequisites:
- You have the CLI installed and logged in.
- You linked a workspace with
agentclash link. - You know whether the pack should be
prompt_evalornative.
1. Start with the CLI scaffold
The fastest way to get a valid starter file is:
agentclash challenge-pack init support-eval.yamlThis generates a minimal bundle that matches the current parser shape. Use --template native if you want a native starter instead of the default prompt_eval.
2. Edit the current minimum shape
This is the smallest honest starting point based on the current bundle parser and tests:
1pack:
2 slug: support-eval
3 name: Support Eval
4 family: support
5
6version:
7 number: 1
8 execution_mode: prompt_eval
9 evaluation_spec:
10 name: support-v1
11 version_number: 1
12 judge_mode: deterministic
13 validators:
14 - key: exact
15 type: exact_match
16 target: final_output
17 expected_from: challenge_input
18 scorecard:
19 dimensions: [correctness]
20
21challenges:
22 - key: ticket-1
23 title: Ticket One
24 category: support
25 difficulty: medium
26 instructions: |
27 Read the request and produce the final answer.
28
29input_sets:
30 - key: default
31 name: Default Inputs
32 cases:
33 - challenge_key: ticket-1
34 case_key: sample-1
35 inputs:
36 - key: prompt
37 kind: text
38 value: hello
39 expectations:
40 - key: answer
41 kind: text
42 source: input:promptThis is not a glamorous pack. It is a good pack skeleton because it matches the current parser shape.
Keep every case in a single input_sets[] entry pointed at the same challenge_key. If a pack covers multiple challenges, split them into separate input sets so each run has one final-output contract to satisfy.
3. Add execution policy only when you need it
If the pack is native, you can add runtime sections like tool_policy, sandbox, and tools.
Example:
1version:
2 number: 2
3 execution_mode: native
4 tool_policy:
5 allowed_tool_kinds:
6 - file
7 - shell
8 - network
9 sandbox:
10 network_access: true
11 network_allowlist:
12 - 203.0.113.0/24
13 evaluation_spec:
14 name: support-v2
15 version_number: 2
16 judge_mode: hybrid
17 validators:
18 - key: exact
19 type: exact_match
20 target: final_output
21 expected_from: challenge_input
22 scorecard:
23 dimensions: [correctness]
24
25tools:
26 custom:
27 - name: check_inventory
28 description: Check inventory by SKU
29 parameters:
30 type: object
31 properties:
32 sku:
33 type: string
34 implementation:
35 primitive: http_request
36 args:
37 method: GET
38 url: https://api.example.com/inventory/${sku}
39 headers:
40 Authorization: Bearer ${secrets.INVENTORY_API_KEY}Use these sections deliberately.
tool_policydecides what kinds of tools are even available.sandbox.network_accessandnetwork_allowlistcontrol outbound networking.tools.customdefines the tool contract the agent sees.implementation.primitivepicks the executor primitive that actually runs.
4. Add assets when inputs should point at files
If the pack needs files, declare them as assets instead of hardcoding mystery paths all over the bundle.
Example:
1version:
2 number: 1
3 execution_mode: native
4 assets:
5 - key: fixtures
6 path: fixtures/workspace.zip
7 media_type: application/zipYou can also back an asset with an uploaded artifact by setting artifact_id instead of only relying on a repository path.
Then cases and expectations can refer to those assets by key.
5. Validate before you publish
The current CLI command is:
agentclash challenge-pack validate support-eval.yamlThis calls the workspace-scoped validation endpoint and checks the same parser and validation logic the publish path uses.
Typical failures the current code will catch early:
- unknown placeholders like
${missing} - invalid CIDR entries in
network_allowlist - self-referencing or cyclic composed tools
- invalid tool parameter schemas
- unknown artifact keys or nonexistent stored artifact IDs
6. Publish the bundle
Once validation passes:
agentclash challenge-pack publish support-eval.yamlThe publish response returns concrete IDs, including:
challenge_pack_idchallenge_pack_version_idevaluation_spec_idinput_set_ids- optional
bundle_artifact_id
Those IDs matter later because run creation asks for a pack version, not a filename.
7. Confirm the workspace can see it
agentclash challenge-pack listIf the pack published cleanly, it should show up in the workspace list with its versions.
8. Run it through the workflow-first eval path
1agentclash eval start --follow
2agentclash baseline set
3agentclash eval scorecardVerification
You should now have:
- a bundle YAML file the current parser accepts
- a successful
validateresult - a published pack version ID you can use in run creation
- a workflow path you can reuse without raw IDs for every step
Troubleshooting
Validation says a tool placeholder is unknown
Your implementation.args template is referencing a variable that is not declared by the tool parameter schema or available template context.
Validation says a tool references itself or forms a cycle
Your composed tool graph is recursive. Break the cycle and delegate to a primitive or a non-cyclic tool chain.
Validation says an artifact key is missing
You referenced an asset or artifact key in a case or expectation that was never declared in the pack.
The pack needs internet access
Do not assume that adding http_request is enough. You also need the relevant sandbox/network policy in the pack and runtime path.