Getting Started
First Eval Walkthrough
Use the current seeded local path to create a run, stream events, and inspect ranking output without inventing setup that is not in the repo.
This walkthrough sticks to what the repo already supports today: seed local data, create a run, stream events, and inspect the result.
1. Bring up the local stack
From the repo root:
./scripts/dev/start-local-stack.sh
If you want the browser UI too:
cd web
pnpm install
pnpm dev
2. Seed a runnable fixture
Back in the repo root:
./scripts/dev/seed-local-run-fixture.sh
That script seeds enough data to create a local run through the API.
3. Create the run
You can hit the API directly:
./scripts/dev/curl-create-run.sh
Or, if you are using the CLI against a prepared workspace, create and follow the run there:
agentclash run create --follow
4. Inspect the result
Once you have a run ID, inspect its status and ranking:
agentclash run get <RUN_ID>
agentclash run ranking <RUN_ID>
If the web app is running, open the workspace run detail view in the browser and inspect the replay and scorecard surfaces from there.
What you should see
- a run record created in the workspace
- event streaming during execution when you follow the run
- a ranking view once the backend has enough completed run-agent results to score
Warning
Without a real sandbox provider such as E2B, the native model-backed path can still stall or fail after run creation. That is expected in the unconfigured local setup.