Release notes

Changelog

Product updates grouped in ten-day windows. Select a release for the full breakdown and merged pull requests.

  1. Latest

    Portable agent skills and one-command host install

    Twenty-five portable Agent Skills shipped with docs, an embedded CLI snapshot, and a standalone agent-skills bundle repo. Install or doctor skills on Claude, Codex, Cursor, OpenClaw, Hermes, and OpenCode with one command, or export an offline bundle as a directory or tar.gz.

    7 updates · 14 merged PRs

  2. Datasets, /try demos, multi-turn transcripts, and Hermes harness

    The datasets platform shipped end-to-end — import, generation, trace ingest, eval gates, and full workspace UI. /try CLI demos, multi-turn transcript exports, Hermes harness support, and PostHog analytics rounded out the release window.

    13 updates · 30 merged PRs

  3. Security packs, multi-turn eval, and replay polish

    Security-family challenge packs, stress-run CLI, and vault boundary harnesses established AgentClash as a security eval surface. Multi-turn evaluation with human takeover and case templating extended the engine beyond single-shot runs.

    10 updates · 21 merged PRs

  4. CI intelligence, prompt eval, and GitHub-native workflows

    Failure clusters, regression provenance, and CI setup generators connected the eval loop to GitHub. Prompt eval CLI commands, the PR comment bot, and E2B harness runners closed the gap between local runs and production gates.

    10 updates · 112 merged PRs

  5. Race context, CLI distribution, and a redesigned public site

    Live race standings injected into running agents, the CLI shipped through npm with production defaults, and the marketing site got a full redesign — /why, pricing, docs foundation, and arena-style run views.

    9 updates · 97 merged PRs

  6. Scoring depth, failure review, and the first regression suite UI

    The evaluation engine gained deterministic validators, LLM judge dimensions, and behavioral scoring. Failure review and regression suites landed in the workspace UI, alongside CLI distribution hardening and the xAI provider adapter.

    12 updates · 60 merged PRs

Back to AgentClash