Challenge packs
Tools, primitives & policy
How tool_policy and tools.custom map to engine primitives in backend/internal/engine and sandbox.ToolPolicy.
AgentClash stacks three tool notions (also summarized in Tools, network, and secrets):
- Workspace tool resources — org-level infrastructure objects (not covered by pack YAML)
- Pack composed tools —
tools.custom[]entries that expand to a JSON Schema plus an implementation - Engine primitives — the concrete executors the worker runs (registered in
nativePrimitiveTools,backend/internal/engine/primitive_tools.go)
Only (2) and (3) are pack-controlled.
Tool policy shape
Set version.tool_policy in your pack YAML. Two keys drive it (the JSON hydrates sandbox.ToolPolicy, backend/internal/sandbox/sandbox.go):
allowed_tool_kinds— the list of capability groups (kinds) the model may useallow_shell— a separate boolean that gates theexecprimitive
Recognized kind strings
allowed_tool_kinds accepts exactly these six values; anything else fails validation (the set is supportedToolKinds in backend/internal/challengepack/validation.go):
browser, build, data, file, network, terminal
Shell is not a kind—enable it with allow_shell: true instead.
Empty allowlist semantics
An empty allowed_tool_kinds means allow every kind (handled by allowsToolKind in primitive_helpers.go). In practice, prefer explicit lists so validation errors catch typos early.
Mode guardrails
A pack's version.execution_mode is one of native, prompt_eval, responses, or multi_turn. Of these, prompt_eval packs must omit tool_policy entirely (prompt-eval runs never call tools)—see Bundle YAML reference.
Built-in primitive names
These are the primitives the worker can run, and the policy that unlocks each (declared in executor_builders.go, registered in nativePrimitiveTools):
| Primitive | Gated by |
|---|---|
submit | Always available (final answer) |
read_file, write_file, list_files, search_files, search_text | file kind |
query_json, query_sql | data kind |
http_request | network kind (+ runtime network flags) |
run_tests, build | build kind |
exec | allow_shell |
Browser tooling exists in policy (toolKindBrowser)—ensure your template + worker build includes whatever browser bridge your pack expects before relying on it in production.
terminal marks packs that pair with Try CLI — interactive disposable terminal demos for README badges and human try-before-install flows. Eval runs still use standard primitives like exec inside the sandbox.
Composed tools (tools.custom[])
Each item:
1tools:
2 custom:
3 - name: call_support_api
4 description: Fetch ticket JSON
5 parameters:
6 type: object
7 properties:
8 ticket_id: { type: string }
9 required: [ticket_id]
10 additionalProperties: false
11 implementation:
12 primitive: http_request
13 args:
14 method: GET
15 url: https://api.example.com/tickets/${ticket_id}
16 headers:
17 Authorization: Bearer ${secrets.SUPPORT_TOKEN}Validation rules each composed tool must satisfy (enforced by validateComposedToolConfig):
- A non-mock tool's
implementation.primitivemust name a different tool than itself (prevents the self-delegation footgun) implementation.argsmust be an object; its templates are validated for placeholder safetyparametersmust be a JSON Schema that passestemplateutil.ValidateToolParameterSchema- The delegation graph cannot contain cycles, and delegation depth cannot exceed 8 jumps
Mock implementations
Set implementation.type: mock to skip primitive resolution—useful for dry-run packs or policy-only testing. Mocks bypass cycle detection.
Workspace tools vs pack tools
Pack tools are not the same records as API tools resources—they are bundle-local contracts interpreted entirely inside the worker.
Secret placeholders
Composed args may reference ${secrets.NAME}, which resolve through workspace secret stores—never place secret material inline. Sandbox env_vars must be literal strings: any value containing a ${...} placeholder is rejected at validation, and ${secrets.*} in particular errors out (enforced by validateEnvVarLiterals in backend/internal/engine/executor_sandbox.go). Environment leaks are too easy, so inject credentials through http_request headers instead.
Provider visibility
The model only ever sees tools that pass both the tool policy and the pack manifest—the provider-specific tool definitions (OpenAI, Anthropic, etc.) are built from the registry's visible map (buildToolRegistry).
See also
- Sandbox & E2B for network pairing with
http_request backend/internal/challengepack/tools_validation_test.gofor edge-case fixtures- Multi-turn packs
- Security evaluation