Getting Started

First Eval Walkthrough

Use the current seeded local path to create a run, stream events, and inspect ranking output without inventing setup that is not in the repo.

This walkthrough sticks to what the repo already supports today: seed local data, create a run, stream events, and inspect the result.

1. Bring up the local stack

From the repo root:

bash
./scripts/dev/start-local-stack.sh

If you want the browser UI too:

bash
1cd web
2pnpm install
3pnpm dev

2. Seed a runnable fixture

Back in the repo root:

bash
./scripts/dev/seed-local-run-fixture.sh

That script seeds enough data to create a local run through the API.

3. Create the run

You can hit the API directly:

bash
./scripts/dev/curl-create-run.sh

Or, if you are using the CLI against a prepared workspace, create and follow the run there:

bash
agentclash eval start --follow

eval start is the workflow-first wrapper around run create — it resolves challenge packs, versions, input sets, and deployments by name or interactive selection. Use agentclash run create directly when you want to pass IDs explicitly (CI scripts, automation).

4. Inspect the result

Once you have a run ID, inspect its status and ranking:

bash
1agentclash run get <RUN_ID>
2agentclash run ranking <RUN_ID>

If the web app is running, open the workspace run detail view in the browser and inspect the replay and scorecard surfaces from there.

What you should see

  • a run record created in the workspace
  • event streaming during execution when you follow the run
  • a ranking view once the backend has enough completed run-agent results to score

Warning

Without a real sandbox provider such as E2B, the native model-backed path can still stall or fail after run creation. That is expected in the unconfigured local setup.

See also