Race context, CLI distribution, and a redesigned public site

Live race standings injected into running agents, the CLI shipped through npm with production defaults, and the marketing site got a full redesign — /why, pricing, docs foundation, and arena-style run views.

  • Race context
  • CLI & npm
  • Public site redesign
  • Docs foundation

What shipped

Added

  • Live race context — standings injection at step boundaries, newswire formatting, and token split between agents vs race context.
  • CLI `--race-context` flags and UI toggle for race standings during live runs.
  • Workflow-first eval commands in the CLI.
  • Dedicated /why manifesto page and pricing section with tier cards and trial CTA.
  • Released CLI binaries now default to https://api.agentclash.dev.

Improved

  • AgentClash-branded auth with liquid-glass login, starfield hero, and 3D clash mark on landing.
  • Workspace navigation made instant with client-side prefetching.
  • Run detail and replay pages restyled with instrument-panel aesthetics.
  • Playground comparison workspace refactored for clearer side-by-side evals.

Merged pull requests

97 PRs
  1. #579fix(billing): stringify Dodo metadata values
  2. #578[codex] fix billing Dodo error bodies
  3. #576fix(billing): hardcode Dodo checkout quantity to 1
  4. #574fix(ci): preserve cwd for action source fallback
  5. #573fix(ci): harden action CLI resolution
  6. #572Finish AgentClash CI setup profiles and conflict handling
  7. #571Create CI setup pull requests from the workspace UI
  8. #570fix: unblock Claude harness pull requests
  9. #569Build CI setup UI generator
  10. #567fix: use reachable app URL in AgentClash PR comments
  11. #566feat: link AgentClash PR comments to app views
  12. #564feat: add AgentClash CI PR comments
  13. #563fix: accept CI default branch metadata
  14. #561feat(web): surface CI curation metadata
  15. #559feat(cli): preserve ci curation links
  16. #557feat(cli): add ci failure taxonomy metadata
  17. #555fix(releasegate): distinguish invalid scorecards
  18. #553feat(scoring): surface evaluator validity
  19. #552fix(cli): render json errors
  20. #551fix(cli): stream json follow run creation
  21. #550fix(challenge-packs): allow reused evaluation spec names
  22. #549feat(regressions): surface remediation hints
  23. #548fix(regressions): batch suite case counts
  24. #547feat(regressions): add proposed case queue
  25. #546feat(ci): auto-detect GitHub labels
  26. #545feat(failures): classify dependency resolution regressions
  27. #544feat(regression): expose proposed validation runs in web
  28. #543feat(regression): validate proposed cases explicitly
  29. #542feat(regression): capture production failures
  30. #541feat(ci): surface failure taxonomy
  31. #540docs: expand CI release gate skill
  32. #539feat(regression): surface maintenance status
  33. #538docs: add regression flywheel skill
  34. #537feat(ci): surface failure cluster trends
  35. #536docs: add scorecard reader skill
  36. #535docs: add eval runner skill
  37. #534feat(regression): surface validation signal
  38. #533docs: add challenge pack validation publish skill
  39. #532docs: expand challenge pack llm judges skill
  40. #531feat(regression): surface failure provenance
  41. #530docs: expand challenge pack scoring validators skill
  42. #529feat(ci): filter failures by cluster key
  43. #528docs: expand challenge pack artifacts skill
  44. #527feat(web): surface failure cluster rollups
  45. #526Skills: challenge pack tools and sandbox
  46. #524Add Claude Agent Harness runner
  47. #522Skills: challenge pack input sets
  48. #518docs(ci): refresh CI release gate skill
  49. #517feat(ci): add reusable GitHub Actions gate
  50. #516feat(ci): propose regression candidates from failing gates
  51. #515feat(ci): publish gate summaries and artifacts
  52. #514feat(ci): attach GitHub metadata to CI runs
  53. #513feat(ci): run manifest gates from the CLI
  54. #512feat: complete Dodo billing integration
  55. #511feat(ci): validate manifest resource IDs against API
  56. #510feat(ci): define baseline resolution flow
  57. #509docs(ci): add real agent workload recipes
  58. #507feat(ci): add manifest should-run precheck
  59. #495fix: materialize artifact-backed pack assets in native sandbox
  60. #494fix(challengepack): reject mixed-challenge input sets
  61. #493fix(scoring): support run tool call count collector
  62. #492fix(api): include challenge pack slug in list response
  63. #486feat(cli): add --repetitions flag to eval start
  64. #485Skills: challenge pack YAML authoring
  65. #484Docs: expand challenge pack planner skill
  66. #483Internal: run blind AgentClash harnesses for skill changes
  67. #482feat(cli): add AgentClash CI manifest contract
  68. #481Skills: agent deployment setup
  69. #480Skills: agent build author
  70. #479feat(docs): expand runtime resources skill
  71. #478feat(docs): expand CLI setup skill
  72. #477[codex] Let GitHub harnesses open draft PRs
  73. #476feat: add GitHub App repo picker foundation
  74. #474Fix Agent Harness diff capture and validator workdirs
  75. #473feat(e2b): add Codex fullstack template build
  76. #472fix(agent-harnesses): allow long E2B process streams
  77. #471fix(agent-harnesses): prevent Codex execution activity timeout
  78. #470fix: harden Codex Agent Harness execution on E2B
  79. #469fix: simplify Agent Harness creation
  80. #464[codex] Complete agent harness executions
  81. #461feat: add Agent Harnesses for Codex on E2B
  82. #460Docs: challenge pack depth, typography, and non-Mermaid diagrams
  83. #459feat(docs): add skill catalog contract
  84. #439[codex] Add frontend billing UX
  85. #438[codex] Add copyable AgentClash skills catalog
  86. #435feat(cli): expose regression and artifact surfaces
  87. #434[codex] Implement billing entitlements and Dodo gates
  88. #431Replace failures-evals lens with particle flywheel animation
  89. #429[codex] Fix same-run agent scorecard comparison
  90. #428[codex] Fix CLI auth identity label source
  91. #422[codex] Refactor playground comparison workspace
  92. #420[codex] Restore semantic lucide icons
  93. #419[codex] Replace web icons with Nourico vectors
  94. #418feat(web): clash streak login shader with cursor + gyro reactivity
  95. #417[codex] Brand login as AgentClash while preserving WorkOS AuthKit
  96. #415fix(web): redirect /workspaces/{id} in middleware to avoid React #310
  97. #414[codex] Add workflow-first CLI eval commands
All releases