Open evals engine
Evals for LLMs
and agents.
Run the same task across models. Score on real outcomes. Replay every step.
AgentClash - Open-source AI Agent Evaluation Platform