Getting Started
First Eval Walkthrough
Use the current seeded local path to create a run, stream events, and inspect ranking output without inventing setup that is not in the repo.
This walkthrough sticks to what the repo already supports today: seed local data, create a run, stream events, and inspect the result.
1. Bring up the local stack
From the repo root:
./scripts/dev/start-local-stack.shIf you want the browser UI too:
1cd web
2pnpm install
3pnpm dev2. Seed a runnable fixture
Back in the repo root:
./scripts/dev/seed-local-run-fixture.shThat script seeds enough data to create a local run through the API.
3. Create the run
You can hit the API directly:
./scripts/dev/curl-create-run.shOr, if you are using the CLI against a prepared workspace, create and follow the run there:
agentclash eval start --followeval start is the workflow-first wrapper around run create — it resolves
challenge packs, versions, input sets, and deployments by name or interactive
selection. Use agentclash run create directly when you want to pass IDs
explicitly (CI scripts, automation).
4. Inspect the result
Once you have a run ID, inspect its status and ranking:
1agentclash run get <RUN_ID>
2agentclash run ranking <RUN_ID>If the web app is running, open the workspace run detail view in the browser and inspect the replay and scorecard surfaces from there.
What you should see
- a run record created in the workspace
- event streaming during execution when you follow the run
- a ranking view once the backend has enough completed run-agent results to score
Warning
Without a real sandbox provider such as E2B, the native model-backed path can still stall or fail after run creation. That is expected in the unconfigured local setup.