AgentClash

Input sets are the unit AgentClash schedules per deployment/candidate. Each input_sets[] entry contains cases[] (CaseDefinition in backend/internal/challengepack/bundle.go).

Case identity

challenge_key — must reference an existing challenges[].key
case_key / legacy item_key — both accepted; normalization duplicates missing side from the other

EffectiveKey() chooses case_key when present for stored rows.

Three authoring styles (coexist)

Legacy payload-only — fill payload map; omit structured inputs/expectations
Structured eval — inputs[] + expectations[] with explicit kind fields
Artifact heavy — assets[] + artifacts[] referencing declared version/challenge assets

IsLegacyPayloadOnly detects style (1) for storage compatibility.

Stored document shape

When modern fields exist, StoredPayload() marshals StoredCaseDocument JSON with schema_version: 1, preserving:

payload
inputs
expectations
artifacts
assets

This is what scoring + replay pull back—not the raw YAML fragment.

Case inputs (`inputs[]`)

CaseInput fields:

| Field | Role | | --- | --- | | key | Stable id for templates / UI | | kind | Drives rendering + validator binding (text, artifact, etc.—product-specific kinds should match worker expectations) | | value | Inline scalar/object | | artifact_key | Pull bytes from declared asset map | | path | Optional relative path inside asset bundle |

Validators can address values through case.inputs.<key> evidence paths.

Expectations (`expectations[]`)

CaseExpectation parallels inputs:

key, kind, value, artifact_key, plus source telling graders where dynamic gold values originate (input:prompt pattern seen in CLI template packs)

Use expectations for:

deterministic string compares
supplying LLM judge reference_from bindings
filesystem validators comparing outputs to expected files

Assets on cases

Case-level assets[] references use the same AssetReference structure as version-level entries (key, path, optional artifact_id). Validation ensures cross-references exist before publish succeeds.

Input set metadata

Optional description on an input set is preserved for UI/discovery; there is no behavioral magic—selection happens by id/key at run creation time.

Choosing input set at run time

CLI eval start accepts --input-set when multiple sets exist; otherwise TTY flows prompt. API consumers pass the chosen input_set_id when creating runs (see OpenAPI CreateRun family).

Input sets & cases

Case identity

Three authoring styles (coexist)

Stored document shape

Case inputs (`inputs[]`)

Expectations (`expectations[]`)

Assets on cases

Input set metadata

Choosing input set at run time

See also

Input sets & cases

Case identity

Three authoring styles (coexist)

Stored document shape

Case inputs (inputs[])

Expectations (expectations[])

Assets on cases

Input set metadata

Choosing input set at run time

See also

Case inputs (`inputs[]`)

Expectations (`expectations[]`)