Release notes
Changelog
Product updates grouped in ten-day windows. Select a release for the full breakdown and merged pull requests.
- Latest
Portable agent skills and one-command host install
Twenty-five portable Agent Skills shipped with docs, an embedded CLI snapshot, and a standalone agent-skills bundle repo. Install or doctor skills on Claude, Codex, Cursor, OpenClaw, Hermes, and OpenCode with one command, or export an offline bundle as a directory or tar.gz.
7 updates · 14 merged PRs
Datasets, /try demos, multi-turn transcripts, and Hermes harness
The datasets platform shipped end-to-end — import, generation, trace ingest, eval gates, and full workspace UI. /try CLI demos, multi-turn transcript exports, Hermes harness support, and PostHog analytics rounded out the release window.
13 updates · 30 merged PRs
Security packs, multi-turn eval, and replay polish
Security-family challenge packs, stress-run CLI, and vault boundary harnesses established AgentClash as a security eval surface. Multi-turn evaluation with human takeover and case templating extended the engine beyond single-shot runs.
10 updates · 21 merged PRs
CI intelligence, prompt eval, and GitHub-native workflows
Failure clusters, regression provenance, and CI setup generators connected the eval loop to GitHub. Prompt eval CLI commands, the PR comment bot, and E2B harness runners closed the gap between local runs and production gates.
10 updates · 112 merged PRs
Race context, CLI distribution, and a redesigned public site
Live race standings injected into running agents, the CLI shipped through npm with production defaults, and the marketing site got a full redesign — /why, pricing, docs foundation, and arena-style run views.
9 updates · 97 merged PRs
Scoring depth, failure review, and the first regression suite UI
The evaluation engine gained deterministic validators, LLM judge dimensions, and behavioral scoring. Failure review and regression suites landed in the workspace UI, alongside CLI distribution hardening and the xAI provider adapter.
12 updates · 60 merged PRs