Guides

Write a Challenge Pack

Author a challenge-pack bundle in YAML, validate it against the current parser, and publish it into a workspace.

Goal: write a pack that the current AgentClash parser, validator, and publish flow will accept.

Prerequisites:

  • You have the CLI installed and logged in.
  • You linked a workspace with agentclash link.
  • You know whether the pack should be prompt_eval or native.

1. Start with the CLI scaffold

The fastest way to get a valid starter file is:

bash
agentclash challenge-pack init support-eval.yaml

This generates a minimal bundle that matches the current parser shape. Use --template native if you want a native starter instead of the default prompt_eval.

2. Edit the current minimum shape

This is the smallest honest starting point based on the current bundle parser and tests:

yaml
1pack:
2  slug: support-eval
3  name: Support Eval
4  family: support
5
6version:
7  number: 1
8  execution_mode: prompt_eval
9  evaluation_spec:
10    name: support-v1
11    version_number: 1
12    judge_mode: deterministic
13    validators:
14      - key: exact
15        type: exact_match
16        target: final_output
17        expected_from: challenge_input
18    scorecard:
19      dimensions: [correctness]
20
21challenges:
22  - key: ticket-1
23    title: Ticket One
24    category: support
25    difficulty: medium
26    instructions: |
27      Read the request and produce the final answer.
28
29input_sets:
30  - key: default
31    name: Default Inputs
32    cases:
33      - challenge_key: ticket-1
34        case_key: sample-1
35        inputs:
36          - key: prompt
37            kind: text
38            value: hello
39        expectations:
40          - key: answer
41            kind: text
42            source: input:prompt

This is not a glamorous pack. It is a good pack skeleton because it matches the current parser shape.

Keep every case in a single input_sets[] entry pointed at the same challenge_key. If a pack covers multiple challenges, split them into separate input sets so each run has one final-output contract to satisfy.

3. Add execution policy only when you need it

If the pack is native, you can add runtime sections like tool_policy, sandbox, and tools.

Example:

yaml
1version:
2  number: 2
3  execution_mode: native
4  tool_policy:
5    allowed_tool_kinds:
6      - file
7      - shell
8      - network
9  sandbox:
10    network_access: true
11    network_allowlist:
12      - 203.0.113.0/24
13  evaluation_spec:
14    name: support-v2
15    version_number: 2
16    judge_mode: hybrid
17    validators:
18      - key: exact
19        type: exact_match
20        target: final_output
21        expected_from: challenge_input
22    scorecard:
23      dimensions: [correctness]
24
25tools:
26  custom:
27    - name: check_inventory
28      description: Check inventory by SKU
29      parameters:
30        type: object
31        properties:
32          sku:
33            type: string
34      implementation:
35        primitive: http_request
36        args:
37          method: GET
38          url: https://api.example.com/inventory/${sku}
39          headers:
40            Authorization: Bearer ${secrets.INVENTORY_API_KEY}

Use these sections deliberately.

  • tool_policy decides what kinds of tools are even available.
  • sandbox.network_access and network_allowlist control outbound networking.
  • tools.custom defines the tool contract the agent sees.
  • implementation.primitive picks the executor primitive that actually runs.

4. Add assets when inputs should point at files

If the pack needs files, declare them as assets instead of hardcoding mystery paths all over the bundle.

Example:

yaml
1version:
2  number: 1
3  execution_mode: native
4  assets:
5    - key: fixtures
6      path: fixtures/workspace.zip
7      media_type: application/zip

You can also back an asset with an uploaded artifact by setting artifact_id instead of only relying on a repository path.

Then cases and expectations can refer to those assets by key.

5. Validate before you publish

The current CLI command is:

bash
agentclash challenge-pack validate support-eval.yaml

This calls the workspace-scoped validation endpoint and checks the same parser and validation logic the publish path uses.

Typical failures the current code will catch early:

  • unknown placeholders like ${missing}
  • invalid CIDR entries in network_allowlist
  • self-referencing or cyclic composed tools
  • invalid tool parameter schemas
  • unknown artifact keys or nonexistent stored artifact IDs

6. Publish the bundle

Once validation passes:

bash
agentclash challenge-pack publish support-eval.yaml

The publish response returns concrete IDs, including:

  • challenge_pack_id
  • challenge_pack_version_id
  • evaluation_spec_id
  • input_set_ids
  • optional bundle_artifact_id

Those IDs matter later because run creation asks for a pack version, not a filename.

7. Confirm the workspace can see it

bash
agentclash challenge-pack list

If the pack published cleanly, it should show up in the workspace list with its versions.

8. Run it through the workflow-first eval path

bash
1agentclash eval start --follow
2agentclash baseline set
3agentclash eval scorecard

Verification

You should now have:

  • a bundle YAML file the current parser accepts
  • a successful validate result
  • a published pack version ID you can use in run creation
  • a workflow path you can reuse without raw IDs for every step

Troubleshooting

Validation says a tool placeholder is unknown

Your implementation.args template is referencing a variable that is not declared by the tool parameter schema or available template context.

Validation says a tool references itself or forms a cycle

Your composed tool graph is recursive. Break the cycle and delegate to a primitive or a non-cyclic tool chain.

Validation says an artifact key is missing

You referenced an asset or artifact key in a case or expectation that was never declared in the pack.

The pack needs internet access

Do not assume that adding http_request is enough. You also need the relevant sandbox/network policy in the pack and runtime path.

See also