Agent Skills
CI Release Gate Skill

Use when wiring AgentClash manifest-based CI gates, deciding whether a PR should run AgentClash, resolving baselines, running `agentclash ci run`, interpreting gate exit codes, collecting CI artifacts, or configuring regression promotion policy in GitHub Actions.
Canonical source: web/content/agent-skills/agentclash-ci-release-gate/SKILL.md
Markdown export: /docs-md/agent-skills/agentclash-ci-release-gate
Use This Skill When

Use when wiring AgentClash manifest-based CI gates, deciding whether a PR should run AgentClash, resolving baselines, running agentclash ci run, interpreting gate exit codes, collecting CI artifacts, or configuring regression promotion policy in GitHub Actions.
Full SKILL.md

markdown
1---
2name: agentclash-ci-release-gate
3description: Use when wiring AgentClash manifest-based CI gates, deciding whether a PR should run AgentClash, resolving baselines, running `agentclash ci run`, interpreting gate exit codes, collecting CI artifacts, or configuring regression promotion policy in GitHub Actions.
4metadata:
5  agentclash.role: ci
6  agentclash.version: "1"
7  agentclash.requires_cli: "true"
8---
9
10# AgentClash CI Release Gate
11
12## Purpose
13Wire an AgentClash candidate build/deployment into a repo-tracked CI manifest, compare it against a baseline run, and turn the release gate verdict into CI status.
14
15## Use When
16- A pull request or mainline workflow should run AgentClash only when relevant files or labels match.
17- A user needs a `.agentclash/ci.yaml` manifest for a candidate agent, workload, baseline, gate, and regression promotion policy.
18- CI should fail on AgentClash release gate failures, warnings, insufficient evidence, setup errors, run timeouts, or candidate run failures.
19- A GitHub Actions workflow should use the repo-local `.github/actions/agentclash-ci` composite action.
20
21## Do Not Use When
22- The agent build spec does not exist yet; use `agentclash-agent-build-author`.
23- Runtime profiles, provider accounts, model aliases, or workspace tools are not configured; use `agentclash-runtime-resources-setup`.
24- The challenge pack or input set is not validated and published; use the challenge-pack skills first.
25- The task is only to read scorecards or failure evidence from an existing run; use `agentclash-scorecard-reader` or `agentclash-regression-flywheel`.
26
27## Environment
28Use hosted production by default:
29
30```bash
31export AGENTCLASH_API_URL="https://api.agentclash.dev"
32export AGENTCLASH_TOKEN="<token>"
33export AGENTCLASH_WORKSPACE="<workspace-id>"
34```
35
36In GitHub Actions, pass tokens through secrets or the composite action inputs. Do not print tokens. `ci run` exits with code `10` and a JSON error envelope when workspace context is missing, so set `AGENTCLASH_WORKSPACE`, pass `--workspace`, or use saved CLI workspace config.
37
38## Manifest Shape
39Create a sample with:
40
41```bash
42agentclash ci init .agentclash/ci.yaml
43agentclash ci init .agentclash/ci.yaml --force
44```
45
46The CLI sample manifest is:
47
48```yaml
49version: 1
50trigger:
51  paths:
52    - .agentclash/agent.json
53    - prompts/**
54    - tools/**
55  labels:
56    - agentclash/eval
57candidate:
58  build:
59    agent_build_id: 00000000-0000-0000-0000-000000000001
60    spec_file: .agentclash/agent.json
61  deployment:
62    name: pr-candidate
63    runtime_profile_id: 00000000-0000-0000-0000-000000000002
64    provider_account_id: 00000000-0000-0000-0000-000000000003
65    model: gpt-5.5
66evaluation:
67  challenge_pack_version_id: 00000000-0000-0000-0000-000000000005
68  input_set_id: 00000000-0000-0000-0000-000000000006
69  regression_suites:
70    - 00000000-0000-0000-0000-000000000007
71baseline:
72  run_id: 00000000-0000-0000-0000-000000000008
73  refresh: manual
74  max_age_days: 30
75gate:
76  fail_on: regression
77regressions:
78  promote_failures: proposed
79```
80
81Exact fields the current CLI parses:
82
83- `version`: must be `1`.
84- `trigger.paths`: required, nonblank doublestar globs; `trigger.labels`: optional labels that can force a run.
85- `candidate.build.agent_build_id`: required existing agent build ID.
86- `candidate.build.spec_file`: required relative path inside the repository; absolute paths and `..` escapes are rejected by `ci run`.
87- `candidate.deployment.name`: optional deployment name; if blank, `ci run` generates `agentclash-ci-<unix>`.
88- `candidate.deployment.runtime_profile_id`: required.
89- `candidate.deployment.provider_account_id` and `candidate.deployment.model`: optional individually, but required together when `ci run` creates the candidate deployment. Remote validation checks that the provider account exists.
90- `evaluation.challenge_pack_version_id`: required.
91- `evaluation.input_set_id`, `evaluation.regression_suites`, and `evaluation.regression_cases`: optional; blank regression entries are invalid.
92- `baseline.run_id`: locked baseline run, preferred for PR gates.
93- `baseline.run_agent_id`: optional but only valid with `baseline.run_id`.
94- `baseline.deployment_id`: moving baseline selector; mutually exclusive with `baseline.run_id`.
95- `baseline.refresh`: optional `manual`, `propose`, or `auto_on_main`; default behavior is `manual`.
96- `baseline.max_age_days`: optional non-negative freshness limit; `0` means no age check.
97- `gate.fail_on`: required, one of `regression`, `warning`, or `insufficient_evidence`.
98- `gate.policy_file`: optional schema field, but the current `agentclash ci run` implementation does not read or post it.
99- `regressions.promote_failures`: required, one of `disabled`, `proposed`, or `auto_on_main`.
100
101Important gate fidelity note: today `ci run` posts `baseline_run_id`, `candidate_run_id`, and optional run-agent IDs to `/v1/release-gates/evaluate`. It does not pass `gate.fail_on`, does not load `gate.policy_file`, and has no `--fail-on` or `--policy-file` flag. The backend normalizes an empty policy to the default release gate policy. Do not claim manifest gate fields customize evaluation until the CLI source wires that behavior.
102
103## Validation Commands
104Validate locally first:
105
106```bash
107agentclash ci validate .agentclash/ci.yaml
108agentclash ci validate .agentclash/ci.yaml --json
109```
110
111Use remote validation when workspace credentials are available:
112
113```bash
114agentclash ci validate .agentclash/ci.yaml --remote --json
115```
116
117Structured `validate` output includes:
118
119```json
120{
121  "path": ".agentclash/ci.yaml",
122  "valid": true,
123  "manifest": {},
124  "remote": {
125    "workspace_id": "<WORKSPACE_ID>",
126    "valid": true,
127    "checks": [
128      {
129        "field": "candidate.build.agent_build_id",
130        "resource": "agent_build",
131        "id": "<AGENT_BUILD_ID>",
132        "valid": true,
133        "code": "ok",
134        "message": "agent build is accessible in the selected workspace"
135      }
136    ]
137  }
138}
139```
140
141Remote validation checks the agent build, runtime profile, provider account, model alias, challenge pack version, input set, regression suites, regression cases, and baseline compatibility. API failures that cannot be reduced to a field problem appear as `remote API error: ...`.
142
143## Should Run
144Use this before spending hosted evaluation budget:
145
146```bash
147agentclash ci should-run --manifest .agentclash/ci.yaml --base origin/main --head HEAD --json
148agentclash ci should-run --manifest .agentclash/ci.yaml --changed-file prompts/refund.md --labels agentclash/eval --json
149```
150
151`--changed-file` may be repeated. `--labels` accepts comma-separated or repeated values. If changed files are omitted and refs are present, the CLI derives files with `git diff --name-only --diff-filter=ACDMRTUXB <base>...<head>`. Ref defaults can come from `AGENTCLASH_CI_BASE`, `GITHUB_BASE_REF` as `origin/<base>`, `AGENTCLASH_CI_HEAD`, `GITHUB_SHA`, and `HEAD` when a base is set.
152
153JSON output shape:
154
155```json
156{
157  "path": ".agentclash/ci.yaml",
158  "should_run": true,
159  "reason": "changed files matched trigger.paths",
160  "changed_files": ["prompts/refund.md"],
161  "labels": ["agentclash/eval"],
162  "checked_path_globs": ["prompts/**"],
163  "checked_labels": ["agentclash/eval"],
164  "matched_paths": [{"pattern": "prompts/**", "file": "prompts/refund.md"}],
165  "matched_labels": ["agentclash/eval"]
166}
167```
168
169The decision is an OR: matched paths or matched labels make `should_run: true`. Reasons are exactly:
170
171- `changed files matched trigger.paths and labels matched trigger.labels`
172- `changed files matched trigger.paths`
173- `labels matched trigger.labels`
174- `no changed files or labels were provided`
175- `no changed files or labels matched manifest triggers`
176
177## Baseline Resolution
178Resolve and print the exact baseline before running the gate:
179
180```bash
181agentclash ci baseline --manifest .agentclash/ci.yaml --json
182```
183
184For `baseline.run_id`, strategy is `locked_run` and source is `baseline.run_id`. The run must be in the selected workspace, completed, compatible with `evaluation.challenge_pack_version_id` and optional `evaluation.input_set_id`, and within `baseline.max_age_days` when set. `baseline.run_agent_id` is resolved against that run when present.
185
186For `baseline.deployment_id`, strategy is `deployment_latest_completed` and source is `baseline.deployment_id`. The CLI selects the newest completed compatible run whose participant used that deployment and warns that deployment baselines move over time. Prefer `baseline.run_id` for PRs.
187
188Refresh next actions are:
189
190- `manual`: after a successful mainline run, update `baseline.run_id` intentionally in a reviewed change.
191- `propose`: after a successful mainline run, open a reviewed change that updates `baseline.run_id`.
192- `auto_on_main`: after a successful protected mainline run, automation may update `baseline.run_id` with an auditable commit.
193
194## Run The Gate
195Run the manifest workflow:
196
197```bash
198agentclash ci run --manifest .agentclash/ci.yaml --json --artifact-dir agentclash-artifacts
199agentclash ci run --manifest .agentclash/ci.yaml --json --summary-file agentclash-summary.md
200agentclash ci run --manifest .agentclash/ci.yaml --follow --timeout 30m --poll-interval 5s
201```
202
203Flags:
204
205- `--manifest`: defaults to `.agentclash/ci.yaml`.
206- `--follow`: streams run events only for non-JSON output.
207- `--timeout`: duration, default `30m`; `0` disables timeout; negative values exit `10`.
208- `--poll-interval`: duration, default `5s`; must be greater than zero.
209- `--summary-file`: writes a Markdown gate summary.
210- `--github-step-summary`: defaults true and appends when `GITHUB_STEP_SUMMARY` is set.
211- `--artifact-dir`: writes stable JSON artifacts.
212- CI metadata overrides: `--ci-provider`, `--ci-repository`, `--ci-pull-request`, `--ci-branch`, `--ci-ref`, `--ci-commit`, `--ci-workflow`, `--ci-workflow-run-id`, `--ci-workflow-run-attempt`, `--ci-workflow-run-url`, `--ci-event`, `--ci-default-branch`.
213
214`ci run` does this in order: validate local manifest, remote-validate resource IDs, create a build version from `candidate.build.spec_file`, mark it ready, create a deployment, resolve the baseline, create a run with `official_pack_mode: "full"` plus optional regression suites/cases, wait for completion, resolve the candidate run agent, optionally fetch scorecard and comparison when reports are enabled, evaluate the release gate, optionally promote regression failures, then write reports.
215
216Structured output includes:
217
218```json
219{
220  "manifest_path": ".agentclash/ci.yaml",
221  "workspace_id": "<WORKSPACE_ID>",
222  "remote_validation": {},
223  "candidate": {
224    "agent_build_id": "<AGENT_BUILD_ID>",
225    "build_version_id": "<BUILD_VERSION_ID>",
226    "deployment_id": "<DEPLOYMENT_ID>",
227    "run_id": "<RUN_ID>",
228    "run_agent_id": "<RUN_AGENT_ID>",
229    "run_status": "completed",
230    "run_url": "<URL>",
231    "deployment_name": "pr-candidate",
232    "ci_metadata": {}
233  },
234  "baseline_resolution": {},
235  "baseline": {
236    "run_id": "<BASELINE_RUN_ID>",
237    "run_agent_id": "<BASELINE_RUN_AGENT_ID>",
238    "status": "completed"
239  },
240  "release_gate": {},
241  "gate_verdict": "pass",
242  "failure_reason": "",
243  "reports": {},
244  "regression_promotions": {},
245  "exit_code": 0
246}
247```
248
249Exit codes are exact:
250
251- `0`: pass.
252- `1`: release gate failed.
253- `2`: release gate warning.
254- `3`: insufficient gate evidence.
255- `10`: invalid manifest, missing workspace for `ci run`, invalid duration flag, invalid CI metadata flag, or local candidate spec error.
256- `20`: API/auth failure or report-writing failure after a successful gate.
257- `30`: candidate run timed out.
258- `31`: candidate run failed before gate evaluation.
259
260Successful terminal run statuses are `completed`, `succeeded`, and `success`. Failed terminal statuses include `failed`, `error`, `errored`, `canceled`, `cancelled`, `aborted`, `timed_out`, `timeout`, and `expired`.
261
262## Reports And Artifacts
263Reports are enabled only when `--summary-file`, `GITHUB_STEP_SUMMARY` with `--github-step-summary`, or `--artifact-dir` is set. `--artifact-dir agentclash-artifacts` writes:
264
265- `agentclash-artifacts/run.json` with kind `agentclash.ci.run`.
266- `agentclash-artifacts/scorecard.json` with kind `agentclash.ci.scorecard`.
267- `agentclash-artifacts/comparison.json` with kind `agentclash.ci.comparison`.
268- `agentclash-artifacts/gate.json` with kind `agentclash.ci.gate`.
269- `agentclash-artifacts/result.json` with kind `agentclash.ci.result`.
270
271Each artifact is wrapped in an envelope with `schema_version: "2026-05-04"`, `kind`, `generated_at`, `manifest_path`, `workspace_id`, `challenge_pack_version_id`, `candidate`, `baseline`, optional gate policy identity fields, and `payload`.
272
273## Regression Promotion
274`regressions.promote_failures` runs only when the gate verdict is `fail`. Modes:
275
276- `disabled`: returns a skipped summary with reason `policy_disabled`.
277- `proposed`: creates proposed regression candidates.
278- `auto_on_main`: creates active cases only on the default branch outside pull request events; otherwise it blocks with `pull_request_event`, `missing_default_branch`, or `non_default_branch`.
279
280Promotion also blocks when `evaluation.regression_suites` is empty, using reason `no_regression_suites`. It lists candidate run failures with `limit=200`, prefers `full_executable` over `output_only`, skips non-promotable or unsupported failures, and avoids existing cases by `source_challenge_identity_id` or metadata `source_failure_cluster_key` unless the existing case status is `archived` or `rejected`.
281
282`regression_promotions` contains `policy`, optional `case_status`, `created`, `existing`, `skipped`, `blocked`, and `errors`. Created/existing items include `suite_id`, `case_id`, `challenge_identity_id`, `challenge_key`, `failure_cluster_key`, `status`, and `created`.
283
284## GitHub Action
285The repo-local composite action is `.github/actions/agentclash-ci`. Example:
286
287```yaml
288name: AgentClash gate
289on:
290  pull_request:
291
292jobs:
293  agentclash:
294    runs-on: ubuntu-latest
295    permissions:
296      contents: read
297    steps:
298      - uses: actions/checkout@v4
299        with:
300          fetch-depth: 0
301      - id: agentclash
302        uses: agentclash/agentclash/.github/actions/agentclash-ci@main
303        with:
304          token: ${{ secrets.AGENTCLASH_TOKEN }}
305          workspace: ${{ secrets.AGENTCLASH_WORKSPACE }}
306          api-url: https://api.agentclash.dev
307          manifest: .agentclash/ci.yaml
308          artifact-dir: agentclash-artifacts
309      - uses: actions/upload-artifact@v4
310        if: always() && steps.agentclash.outputs['should-run'] == 'true'
311        with:
312          name: agentclash-ci
313          path: |
314            ${{ steps.agentclash.outputs['result-file'] }}
315            ${{ steps.agentclash.outputs['artifact-dir'] }}/*.json
316```
317
318Action inputs are exactly `manifest`, `token`, `workspace`, `api-url`, `install-cli`, `cli-version`, `remote-validate`, `skip-if-unmatched`, `base`, `head`, `changed-files`, `labels`, `artifact-dir`, `result-file`, `timeout`, `poll-interval`, `follow`, and `default-branch`.
319
320Action outputs are exactly `should-run`, `skip-reason`, `run-id`, `gate-verdict`, `exit-code`, `result-file`, and `artifact-dir`.
321
322The action exports `AGENTCLASH_TOKEN`, `AGENTCLASH_WORKSPACE`, and `AGENTCLASH_API_URL`, installs `agentclash@<cli-version>` when `install-cli` is true, runs `ci validate` with `--remote` by default, runs `ci should-run --json`, exits `0` when unmatched and `skip-if-unmatched` is true, then runs `ci run --json --artifact-dir`.
323
324## Failure Modes
325- Missing token or API access: confirm `AGENTCLASH_TOKEN` and hosted API URL.
326- Missing workspace: set `AGENTCLASH_WORKSPACE`, pass `--workspace`, or configure workspace locally.
327- Local validation fails: fix YAML fields before remote calls; unknown YAML fields fail because the decoder uses known fields.
328- Remote validation fails: check workspace visibility for build, runtime profile, provider account, model alias, challenge pack version, input set, regression suites/cases, and baseline.
329- `should-run` skips unexpectedly: inspect `trigger.paths`, `trigger.labels`, checkout depth, base/head refs, and explicitly passed `changed-files`.
330- Candidate spec fails: `candidate.build.spec_file` must be readable JSON at a relative path inside the repo.
331- Candidate run is slow: tune `--timeout` and `--poll-interval`.
332- `auto_on_main` is blocked: pass default-branch metadata and run from the default branch, not a PR event.
333
334## Report Back Format
335```text
336Manifest: <path>
337Should run: <true/false + reason>
338Baseline: <run-id or deployment selector + strategy>
339Candidate run: <run-id>
340Gate verdict: <pass|warn|fail|insufficient_evidence>
341Exit code: <code + meaning>
342Artifacts: <result-file and artifact-dir>
343Regression candidates: <created/existing/skipped/blocked/errors>
344Next command: <exact agentclash command or GitHub Actions fix>
345```
346
347## Related Skills
348- `agentclash-hub`: workflow map and dependency order.
349- `agentclash-cli-setup`: authenticate, select workspace, and configure hosted API.
350- `agentclash-runtime-resources-setup`: create runtime profiles, provider accounts, model aliases, secrets, and tools.
351- `agentclash-agent-build-author`: create the candidate build spec and ready build version.
352- `agentclash-agent-deployment-setup`: understand deployment resources used by the CI manifest.
353- `agentclash-challenge-pack-validation-publish`: publish the challenge pack version and optional input sets.
354- `agentclash-eval-runner`: run ad hoc evals before formal CI gates.
355- `agentclash-scorecard-reader`: inspect scorecard, comparison, replay, and failure evidence.
356- `agentclash-compare-and-triage`: baseline bookmarks, `compare latest --gate`, and replay triage.
357- `agentclash-regression-flywheel`: promote failures and manage regression suites/cases.
358- `agentclash-dataset-workflows`: dataset eval gates with `--format junit` for CI pipelines.
359- `agentclash-security-evaluation`: client-side security stress harnesses before full pipeline runs.
360
361## Related Docs
362- `/docs-md/guides/ci-cd-agent-gates`
363- `/docs-md/guides/ci-cd-workload-recipes`
364- `/docs-md/challenge-packs/eval-workflows-and-gates`
PreviousRegression Flywheel Skill NextAgent Harness Setup Skill