Glossary
What is a release gate?
A release gate is the pass or fail boundary between a candidate agent run and an approved baseline. AgentClash evaluates both on the same pack and fails CI when correctness, cost, or evidence quality regresses.
Candidate
Baseline
Control
replay timeline
ci verdict
Correctness improved, latency within budget, and required artifacts were preserved for review.
agentclash run create --follow
What gates enforce
Built for reviewable agent decisions
Scorecard thresholds, validator failures, and baseline comparisons that stop a model or harness change from reaching users.
Sandboxed real-tool execution
Head-to-head runs with fair constraints
Scorecards for correctness, cost, latency, and tool strategy
Replay trails for every important action
Challenge packs that turn failures into reusable tests
CI gates for baseline versus candidate decisions
Workflow
Gate workflow
Package the task
Describe the workload as a challenge pack with inputs, tools, scoring rules, and artifacts.
Race the agents
Run every candidate against the same task with the same constraints.
Replay the evidence
Inspect tool calls, outputs, artifacts, latency, cost, and judge evidence after the run.
Gate the release
Compare candidate and baseline runs, then fail CI before a regression reaches users.
Wire gates in CI
Bring your first workload into the loop
Use the CI/CD agent gates guide to connect manifests, baselines, and pull request checks.
Agent evals
Real-task agent evals with replay evidence and CI gates.
LLM agent evaluation
Evaluate LLM agents on full trajectories, not one-shot answers.
Compare tools
See how AgentClash differs from prompt-eval platforms.
CI/CD agent evaluation
Landing page for eval in CI.
Eval workflows and gates
Docs on baselines and manifests.
Enterprise pilot
Stand up governed gates with platform support.
CI/CD agent gates
Fail a pull request when an agent regresses.
FAQ
Release gate FAQ
What is a baseline in a release gate?
A baseline is the approved run you compare against, usually pinned by run ID in a CI manifest after a green eval on main.
Can gates run outside CI?
Yes. Teams also gate manually before production promotion, using the same scorecard and replay evidence produced in CI.
How is this different from a linter?
Linters check static code. Release gates execute the agent on real tasks and score the full trajectory with replay attached.