Datasets, /try demos, multi-turn transcripts, and Hermes harness
The datasets platform shipped end-to-end — import, generation, trace ingest, eval gates, and full workspace UI. /try CLI demos, multi-turn transcript exports, Hermes harness support, and PostHog analytics rounded out the release window.
- Datasets platform
- /try CLI demos
- Multi-turn transcripts
- Agent harnesses
What shipped
Added
- Datasets foundation — interop import/export, version snapshots, eval runs, baselines, CI gate verdicts, and JUnit output.
- Self-Instruct synthetic dataset generation with async jobs and CLI parity.
- Production trace ingest, trace candidates, and promote-to-example workflow.
- Full dataset management UI in the workspace — import mapping, eval wait, dataset test flow, and generation history.
- /try interactive terminal demos with E2B sandbox, free-trial gateway, and Kimi/Qwen/OpenCode integrations.
- OpenAI Responses API as a first-class execution mode.
- Multi-turn conversation transcript API with replay rendering, scorecard view, and PDF export.
- Vibe eval draft persistence for in-progress eval configuration.
- Hermes agent harness support in E2B templates and harness execution workflow.
- PostHog usage analytics across backend, CLI, and web.
Improved
- SEO and AEO discoverability expanded across public marketing and docs surfaces.
- Landing page bar promoting /try CLI demos.
Fixed
- Dataset generation provider validation returns clear 400 errors instead of opaque failures.
Merged pull requests
30 PRs- #911feat(web): dataset UI epic — full API/CLI parity in workspace UI
- #910fix: return 400 for generation provider mismatch
- #909fix: dataset QA follow-ups (nil tags, counts, provider match)
- #908fix: allow non-manual dataset example sources in production
- #906feat: Self-Instruct synthetic dataset generation (#892)
- #904feat: dataset CI follow-up — regression sync, JUnit, web tab
- #903fix(web): sync package-lock.json with @vercel/speed-insights
- #902feat(web): expand SEO + AEO discoverability (compare pages, AI-crawler rules, structured data)
- #901feat(datasets): baselines and CI gate verdict (#891)
- #899feat(datasets): production trace ingest and candidate promotion (#890)
- #898feat(datasets): run evals from pinned dataset versions
- #897feat(analytics): PostHog usage tracking (backend, CLI, web)
- #895feat(datasets): add dataset import/export interop
- #893feat: add datasets foundation
- #884fix(web): keep docs header "Get Started" button from overflowing
- #883feat(try-cli): Kimi + Qwen demos via their own CLIs; opencode on Zen
- #882feat(try-cli): free-trial gateway + signed-in BYO tier for AI demos
- #881feat(try-cli): AI coding agent demos, prebaked E2B template, redesigned UI
- #880feat(try-cli): Interactive terminal demos (AgentClash primitive)
- #878feat: add Vibe Eval draft persistence
- #877docs: add Vibe Eval Phase 0 inventory
- #867feat(web): multi-turn transcript + PDF export on scorecard
- #866feat(web): export multi-turn transcript to PDF
- #865feat(web): multi-turn conversation transcript on replay page
- #864feat(api): multi-turn conversation transcript endpoint
- #863fix(multi-turn): surface turn events + let llm phases pin a model
- #862fix(engine): give multi_turn LLM simulator the secret-injected ctx
- #859feat: OpenAI Responses API execution mode
- #525Add Hermes Agent Harness runner
- #523Add OpenClaw Agent Harness runner