Challenge packs

Challenge pack documentation

Deep, consumer-facing YAML and runtime reference keyed to the parsers, validators, and workers in this repository.

A challenge pack is the versioned YAML bundle that defines a benchmark task: its prompts, tools, sandbox, evaluation spec, and input cases. These pages complement the short concept guide Challenge packs and inputs. They spell out everything a benchmark author needs to publish a pack that survives server-side parsing, validation, and execution.

Everything here is keyed to shipped code paths—not roadmap language. When behavior changes upstream, validate again with:

bash
agentclash challenge-pack validate your-pack.yaml

What's covered

TopicUse when you…Anchor in repo
Bundle YAML referenceNeed the authoritative field list and execution_mode rules (native, prompt_eval, responses, multi_turn)backend/internal/challengepack/bundle.go, validation.go
Evaluation spec referenceChoose validator types, wire target/expected_from, add metricsbackend/internal/scoring/spec.go, validation.go, engine_*.go
LLM judgesAdd rubrics, assertions, pairwise comparison, budgetsbackend/internal/scoring/spec.go, validation_judges.go
Tools, primitives & policyDecide allowed_tool_kinds, map composed tools → primitivesbackend/internal/engine/primitive_tools.go, tool_registry.go, sandbox/sandbox.go
Sandbox & E2BTune network_allowlist, template id, sandbox providerbackend/internal/challengepack/bundle.go, sandbox/e2b/, worker config
Input sets & casesModel fixtures, typed inputs and expectationschallengepack/bundle.go (CaseDefinition), StoredCaseDocument
Eval workflows & gatesChain eval start, baselines, scorecards, comparisonscli/cmd/eval.go, baseline.go, compare.go

See also