May 25 – Jun 01, 2026

Datasets, /try demos, multi-turn transcripts, and Hermes harness

The datasets platform shipped end-to-end — import, generation, trace ingest, eval gates, and full workspace UI. /try CLI demos, multi-turn transcript exports, Hermes harness support, and PostHog analytics rounded out the release window.

Datasets platform
/try CLI demos
Multi-turn transcripts
Agent harnesses

What shipped

Added

Datasets foundation — interop import/export, version snapshots, eval runs, baselines, CI gate verdicts, and JUnit output.
Self-Instruct synthetic dataset generation with async jobs and CLI parity.
Production trace ingest, trace candidates, and promote-to-example workflow.
Full dataset management UI in the workspace — import mapping, eval wait, dataset test flow, and generation history.
/try interactive terminal demos with E2B sandbox, free-trial gateway, and Kimi/Qwen/OpenCode integrations.
OpenAI Responses API as a first-class execution mode.
Multi-turn conversation transcript API with replay rendering, scorecard view, and PDF export.
Vibe eval draft persistence for in-progress eval configuration.
Hermes agent harness support in E2B templates and harness execution workflow.
PostHog usage analytics across backend, CLI, and web.

Improved

SEO and AEO discoverability expanded across public marketing and docs surfaces.
Landing page bar promoting /try CLI demos.

Fixed

Dataset generation provider validation returns clear 400 errors instead of opaque failures.

Merged pull requests

30 PRs