Datasets, /try demos, multi-turn transcripts, and Hermes harness

The datasets platform shipped end-to-end — import, generation, trace ingest, eval gates, and full workspace UI. /try CLI demos, multi-turn transcript exports, Hermes harness support, and PostHog analytics rounded out the release window.

  • Datasets platform
  • /try CLI demos
  • Multi-turn transcripts
  • Agent harnesses

What shipped

Added

  • Datasets foundation — interop import/export, version snapshots, eval runs, baselines, CI gate verdicts, and JUnit output.
  • Self-Instruct synthetic dataset generation with async jobs and CLI parity.
  • Production trace ingest, trace candidates, and promote-to-example workflow.
  • Full dataset management UI in the workspace — import mapping, eval wait, dataset test flow, and generation history.
  • /try interactive terminal demos with E2B sandbox, free-trial gateway, and Kimi/Qwen/OpenCode integrations.
  • OpenAI Responses API as a first-class execution mode.
  • Multi-turn conversation transcript API with replay rendering, scorecard view, and PDF export.
  • Vibe eval draft persistence for in-progress eval configuration.
  • Hermes agent harness support in E2B templates and harness execution workflow.
  • PostHog usage analytics across backend, CLI, and web.

Improved

  • SEO and AEO discoverability expanded across public marketing and docs surfaces.
  • Landing page bar promoting /try CLI demos.

Fixed

  • Dataset generation provider validation returns clear 400 errors instead of opaque failures.

Merged pull requests

30 PRs
  1. #911feat(web): dataset UI epic — full API/CLI parity in workspace UI
  2. #910fix: return 400 for generation provider mismatch
  3. #909fix: dataset QA follow-ups (nil tags, counts, provider match)
  4. #908fix: allow non-manual dataset example sources in production
  5. #906feat: Self-Instruct synthetic dataset generation (#892)
  6. #904feat: dataset CI follow-up — regression sync, JUnit, web tab
  7. #903fix(web): sync package-lock.json with @vercel/speed-insights
  8. #902feat(web): expand SEO + AEO discoverability (compare pages, AI-crawler rules, structured data)
  9. #901feat(datasets): baselines and CI gate verdict (#891)
  10. #899feat(datasets): production trace ingest and candidate promotion (#890)
  11. #898feat(datasets): run evals from pinned dataset versions
  12. #897feat(analytics): PostHog usage tracking (backend, CLI, web)
  13. #895feat(datasets): add dataset import/export interop
  14. #893feat: add datasets foundation
  15. #884fix(web): keep docs header "Get Started" button from overflowing
  16. #883feat(try-cli): Kimi + Qwen demos via their own CLIs; opencode on Zen
  17. #882feat(try-cli): free-trial gateway + signed-in BYO tier for AI demos
  18. #881feat(try-cli): AI coding agent demos, prebaked E2B template, redesigned UI
  19. #880feat(try-cli): Interactive terminal demos (AgentClash primitive)
  20. #878feat: add Vibe Eval draft persistence
  21. #877docs: add Vibe Eval Phase 0 inventory
  22. #867feat(web): multi-turn transcript + PDF export on scorecard
  23. #866feat(web): export multi-turn transcript to PDF
  24. #865feat(web): multi-turn conversation transcript on replay page
  25. #864feat(api): multi-turn conversation transcript endpoint
  26. #863fix(multi-turn): surface turn events + let llm phases pin a model
  27. #862fix(engine): give multi_turn LLM simulator the secret-injected ctx
  28. #859feat: OpenAI Responses API execution mode
  29. #525Add Hermes Agent Harness runner
  30. #523Add OpenClaw Agent Harness runner
All releases