Challenge packs
Bundle YAML reference
Structured reference for challenge-pack bundles as parsed by backend/internal/challengepack.
AgentClash challenge packs ship as one YAML file decoded into challengepack.Bundle (backend/internal/challengepack/bundle.go). Publication stores a JSON manifest combining the pack body with metadata for execution and scoring (ManifestJSON in the same file).
Top-level keys
All keys listed here are persisted or validated somewhere in publish/validate—not decorative.
pack (required)
Human metadata surfaced in API and CLI lists:
slug— required, workspace-unique branding stringname— required display namefamily— required grouping/category stringdescription— optional
version (required)
The executable spine of the bundle:
| Field | Notes |
|---|---|
number | Required positive integer (int32). Bumps when you materially change validators, tooling, sandbox, scores, etc. |
execution_mode | One of native, prompt_eval, responses, or multi_turn. An empty value is accepted as legacy, but always set it explicitly. See execution mode compatibility below. |
tool_policy | Arbitrary-shaped map mirrored into manifest tool_policy. Must be empty when execution_mode is prompt_eval (enforced when the bundle is validated). |
filesystem | Optional filesystem constraints blob (same manifest path as today). |
sandbox | Optional SandboxConfig. Forbidden when execution_mode is prompt_eval. |
evaluation_spec | Required scoring contract unmarshalled through scoring package strict decode (scoring.StrictDecodeEvaluationSpec). Typos surface at parse time rather than silently defaulting. |
assets | Version-scoped AssetReference[] keyed for cases and uploads. |
SandboxConfig (bundle.go) currently supports:
network_access(bool)network_allowlist(CIDR strings; invalid CIDR rejects publish)env_vars(map of string literals—see Sandbox & E2B)additional_packages(APT-style names constrained by regexp in validation)sandbox_template_id(optional provider template override)
Manifest JSON merges sandbox_template_id from version.sandbox into the serialized version block for backends that historically keyed off that field separately.
tools (optional)
Optional map keyed by integration style; authoring today uses tools.custom as an array of composed tools. Must be empty when execution_mode is prompt_eval or responses.
See Tools, primitives & policy.
challenges (required, non-empty)
Each challenge includes:
| Field | Required | Purpose |
|---|---|---|
key | yes | Stable id referenced by cases |
title | yes | Display |
category | yes | Stored metadata |
difficulty | yes | Must be one of easy, medium, hard, or expert (any other value fails publish with "must be one of easy, medium, hard, expert") |
instructions | often | Prompt body; mirrored into definition.instructions if missing there |
definition | optional | Extra JSON-compatible bag for product-specific authoring |
assets | optional | Challenge-scoped files |
artifact_refs | optional | Artifact key references validated against declared artifacts |
input_sets (required)
Defines runnable cases:
keyandnamerequired on each set.- Prefer modern
casesarray. Legacyitemsaliases are normalized intocasesduringnormalizeBundle; do not rely onitemsin new authoring. - All cases in a single input set must reference the same
challenge_key; use separate input sets for separate challenges.
Cases are modeled by CaseDefinition with challenge_key, case_key, optional rich inputs/expectations, artifacts, assets, plus legacy payload for blob-only authoring.
Deep dive: Input sets & cases.
Execution mode compatibility
execution_mode accepts four values; each constrains which other blocks may appear (enforced when the bundle is validated):
native— the full runtime. You may populateversion.sandbox, allowed tool kinds, and composed tools; the worker hydrates the richer runtime path. Choose this when you rely on sandbox files, primitives, or validators that read captured files (file:evidence).prompt_eval— pure model-output workloads. Thetoolsblock must not be present,version.sandboxmust not be present, andversion.tool_policymust be empty.responses— OpenAI Responses-API workloads. Thetoolsblock must be empty (use OpenAI Responses tools, not pack-defined tools).multi_turn— scripted or simulated multi-turn conversations. See Multi-turn packs for the per-caseuser_simulatormanifest.
Choose prompt_eval for pure model-output workloads; promote to native when you need sandbox files, primitives, or file-based evidence.
After publish — IDs returned
Publishing returns stable identifiers referenced by runs (see authoring guide):
challenge_pack_idchallenge_pack_version_idevaluation_spec_idinput_set_ids- Optional
bundle_artifact_id
Run creation binds challenge_pack_version_id specifically—YAML filenames stop mattering immediately after publish.