Getting Started

First Eval Walkthrough

Use the current seeded local path to create a run, stream events, and inspect ranking output without inventing setup that is not in the repo.

This walkthrough sticks to what the repo already supports today: seed local data, create a run, stream events, and inspect the result.

1. Bring up the local stack

From the repo root:

bash

./scripts/dev/start-local-stack.sh

If you want the browser UI too:

bash

1cd web
2pnpm install
3pnpm dev

2. Seed a runnable fixture

Back in the repo root:

bash

./scripts/dev/seed-local-run-fixture.sh

That script seeds enough data to create a local run through the API.

3. Create the run

You can hit the API directly:

bash

./scripts/dev/curl-create-run.sh

Or, if you are using the CLI against a prepared workspace, create and follow the run there:

bash

agentclash eval start --follow

eval start is the workflow-first wrapper around run create — it resolves challenge packs, versions, input sets, and deployments by name or interactive selection. Use agentclash run create directly when you want to pass IDs explicitly (CI scripts, automation).

4. Inspect the result

Each create command from step 3 prints the run ID: the CLI (eval start / run create) writes a Run ID: line, and the curl script prints the raw POST /v1/runs JSON response, whose id field is the run ID. Use that value in place of <RUN_ID> to inspect status and ranking:

bash

1agentclash run get <RUN_ID>
2agentclash run ranking <RUN_ID>

If the web app is running, open the workspace run detail view in the browser and inspect the replay and scorecard surfaces from there.

What you should see

a run record created in the workspace
event streaming during execution when you follow the run
a ranking view once the backend has enough completed run-agent results to score

Warning

Without a real sandbox provider such as E2B, the native model-backed path can still stall or fail after run creation. That is expected in the unconfigured local setup.

1. Bring up the local stack

2. Seed a runnable fixture

3. Create the run

4. Inspect the result

What you should see

See also