Open evals engine

Evals for LLMs
and agents.

Run the same task across models. Score on real outcomes. Replay every step.

AgentClash - Open-source AI Agent Evaluation Platform