Reference

CLI Reference

Commands, flags, and command groups generated from the current Cobra CLI source.

Intro-oriented readers should still start from Hosted quickstart; this page focuses on mechanically listing what cli/cmd exposes after each docs rebuild. For repo layout around the CLI, see Codebase tour.

Global flags

--api-url — API base URL (overrides config)
--json — Output in JSON format
--no-color — Disable color output
--output (-o) — Output format: table, json, yaml
--quiet (-q) — Suppress non-essential output
--verbose (-v) — Enable debug output on stderr
--workspace (-w) — Workspace ID (overrides config)
--yes — Skip confirmation prompts

Command groups

`agent-harness`

Manage coding-agent harnesses

`create`

Create an agent harness

Flags

--api-key-secret — Workspace secret name containing the runner provider API key
--auth-mode default: api_key_secret — Harness auth mode: api_key_secret
--base-branch — Base branch for repository work
--codex-model — Runner model override
--codex-template — E2B template override for the harness runner
--description — Harness description
--evaluation-config — Inline JSON evaluation config
--evaluation-config-file — JSON file with validators and LLM judges
--execution-config — Inline JSON execution config
--from-file — JSON file with agent harness spec
--harness-kind default: codex_e2b — Harness runner kind: codex_e2b, claude_e2b, hermes_e2b, or openclaw_e2b
--name — Harness name
--openai-api-key-secret — Workspace secret name containing OPENAI_API_KEY
--repository-url — Repository URL for the harness task
--task — Task prompt for the coding harness

`execution`

Inspect agent harness executions

`cancel <execution-id>`

Cancel an Agent Harness execution

`failure-review`

Inspect or edit Agent Harness failure classifications

`get <execution-id>`

Get Agent Harness failure review

`update <execution-id>`

Update Agent Harness failure review annotations

Flags

--from-file — JSON file with failure review update payload
--human-class — Human-curated failure class
--human-payload — Inline JSON human payload
--human-summary — Human-curated failure summary
--suggested-class — Suggested failure class
--suggested-confidence — Suggested confidence as a decimal
--suggested-payload — Inline JSON suggested payload
--suggested-source — Suggested source: rules or llm
--suggested-summary — Suggested failure summary

`get <execution-id>`

Get an agent harness execution

`promote-task <execution-id>`

Promote a prior harness run into a private suite task

Flags

--failure-class — Failure class to store with promotion metadata
--failure-summary — Failure summary to store with promotion metadata
--from-file — JSON file with promotion payload
--metadata — Inline JSON promotion metadata
--public-prompt — Sanitized public prompt
--suite — Target Agent Harness suite ID
--title — Promoted private task title

`retry <execution-id>`

Retry a terminal Agent Harness execution

Flags

--idempotency-key — Retry idempotency key

`executions <harness-id>`

List executions for an agent harness

`failures`

Inspect Agent Harness failure summaries

`summary`

Summarize Agent Harness failure modes

`get <id>`

Get an agent harness

`list`

List agent harnesses

`run <harness-id>`

Start an agent harness execution

Flags

--follow — Poll until the harness execution reaches a terminal status
--message — Override the harness task prompt for this execution
--poll-interval — Polling interval for --follow

`suite`

Manage Agent Harness suites and private task banks

`create`

Create an Agent Harness suite/private task bank

Flags

--description — Suite description
--from-file — JSON file with agent harness suite spec
--metadata — Inline JSON suite metadata
--name — Suite name
--task-json — Suite task JSON object; may be repeated

`list`

List Agent Harness suites

`rankings <suite-id>`

Get Agent Harness suite rankings

Flags

--k — k value for pass@k and pass^k
--version-id — Immutable suite version ID

`run <suite-id>`

Start suite runs across one or more harnesses

Flags

--harness — Harness ID to run; may be repeated or comma-separated
--task — Suite task ID filter; may be repeated or comma-separated

`tasks <suite-id>`

List public tasks for an Agent Harness suite

`artifact`

Upload and download artifacts

`download <artifactId>`

Download an artifact

Flags

--output (-O) — Output file path (defaults to stdout)

`list`

List artifacts in the workspace

`upload <file>`

Upload an artifact

Flags

--metadata — JSON metadata (optional)
--run — Run ID (optional)
--run-agent — Run agent ID (optional)
--type (required) — Artifact type (required)

`validate-voice-manifest <file>`

Validate a voice artifact manifest JSON file against the AgentClash schema

`validate-voice-report <file>`

Validate a voice report JSON file against an AgentClash schema

Flags

--schema — JSON Schema path (required when report type cannot be auto-detected)

`auth`

Manage authentication

`login`

Flags

--device — Print the verification URL instead of opening the browser automatically
--force — Start a new browser login even if existing credentials are valid

`logout`

Log out and remove stored credentials

`status`

Show current authentication status

`tokens`

Manage CLI access tokens

`list`

List your CLI tokens

`revoke <token-id>`

Revoke a CLI token

`baseline`

Manage the workspace-scoped default baseline run

`clear`

Clear the default baseline run for the current workspace

`set [run]`

Bookmark a run as the default baseline for the current workspace

Flags

--agent — Run agent ID or label (optional)

`show`

Show the default baseline run for the current workspace

`build`

Manage agent builds

`create`

Create a new agent build

Flags

--description — Build description
--name (required) — Build name (required)

`get <id>`

Get agent build with version history

`list`

List agent builds

`version`

Manage agent build versions

`create <buildId>`

Create a new draft version

Flags

--agent-kind — Agent kind: llm_agent, workflow_agent, programmatic_agent, multi_agent_system, hosted_external
--spec-file — JSON file with version spec fields
--template — Template key to scaffold this version (for example: honest-agent, code-reviewer)

`get <versionId>`

Get a build version

`ready <versionId>`

Mark a version as ready (immutable, deployable)

`templates`

List built-in build version templates

`update <versionId>`

Update a draft build version

Flags

--spec-file — JSON file with updated version spec fields

`validate <versionId>`

Validate a build version

`challenge-pack`

Manage challenge packs

`init <file>`

Scaffold a minimal challenge pack YAML bundle

Flags

--force — Overwrite an existing file
--name — Challenge pack display name (defaults from the file name)
--slug — Challenge pack slug (defaults from the file name)
--template default: prompt_eval — Starter template: prompt_eval, responses, or native

`list`

List challenge packs

`publish <file>`

Publish a challenge pack YAML bundle

`validate <file>`

Validate a challenge pack YAML bundle

`ci`

Manage AgentClash CI manifests

`baseline`

Resolve the baseline selected by an AgentClash CI manifest

Flags

--manifest default: .agentclash/ci.yaml — Path to the AgentClash CI manifest

`init <file>`

Write a sample AgentClash CI manifest

Flags

--force — Overwrite an existing manifest

`run`

Run the AgentClash CI workflow described by a manifest

Flags

--artifact-dir — Write stable AgentClash CI JSON artifacts to this directory
--ci-branch — Branch metadata override
--ci-commit — Commit SHA metadata override
--ci-default-branch — Default branch metadata override for auto_on_main regression promotion
--ci-event — CI event name metadata override
--ci-provider — CI provider metadata override
--ci-pull-request — Positive pull request number metadata override
--ci-ref — Git ref metadata override
--ci-repository — Repository metadata override, for example owner/repo
--ci-workflow — Workflow name metadata override
--ci-workflow-run-attempt — Workflow run attempt metadata override
--ci-workflow-run-id — Workflow run id metadata override
--ci-workflow-run-url — Workflow run URL metadata override
--follow — Stream run events while waiting for the candidate run
--github-step-summary — Append a GitHub Actions step summary when GITHUB_STEP_SUMMARY is set
--manifest default: .agentclash/ci.yaml — Path to the AgentClash CI manifest
--poll-interval — Polling interval while waiting for run completion
--summary-file — Write a Markdown CI gate summary to this file
--timeout — Maximum time to wait for the candidate run; 0 disables the timeout

`should-run`

Decide whether AgentClash CI should run

Flags

--base — Base git ref for deriving changed files
--changed-file — Changed file path; may be repeated
--github-event — GitHub event JSON file for deriving pull request labels
--head — Head git ref for deriving changed files
--labels — Pull request labels; may be comma-separated or repeated
--manifest default: .agentclash/ci.yaml — Path to the AgentClash CI manifest
--repo default: . — Git repository path for --base/--head diff

`validate <file>`

Validate an AgentClash CI manifest

Flags

--remote — Validate manifest resource IDs against the selected workspace

`compare`

Compare runs and evaluate release gates

`gate`

Evaluate a release gate (nonzero exit = regression or missing evidence)

Flags

--baseline (required) — Baseline run ID (required)
--baseline-agent — Baseline run agent ID (optional)
--candidate (required) — Candidate run ID (required)
--candidate-agent — Candidate run agent ID (optional)

`latest`

Compare the saved baseline against the latest non-baseline run

Flags

--agent — Run agent ID or label to use for both runs when possible
--baseline-agent — Baseline run agent ID or label (defaults to the saved baseline agent)
--candidate-agent — Candidate run agent ID or label
--gate — Also evaluate the release gate and return a nonzero exit code for non-pass verdicts

`runs`

Compare baseline vs candidate runs

`completion [bash|zsh|fish|powershell]`

Generate a shell completion script

`config`

Manage CLI configuration

`get <key>`

Get a config value

`list`

List all config values

`set <key> <value>`

Set a config value

`dataset`

Manage eval datasets

`create`

Create a dataset

Flags

--default-challenge-pack-version-id — Default challenge pack version ID
--description — Dataset description
--enforce-schema — Reject examples that do not match the input schema
--from-file — JSON file with dataset create payload
--input-schema — Input JSON Schema
--name — Dataset name
--slug — Dataset slug

`delete <datasetId>`

Archive a dataset

`eval <datasetId>`

Run an eval over a dataset version

Flags

--challenge — Challenge key to bind examples to
--deployment — Agent deployment ID (repeatable)
--follow — Follow run events after creation
--mapping — Optional JSON mapping for dataset example fields
--name — Optional run name
--pack — Challenge pack version ID
--version — Dataset version ID to run

`example`

Manage dataset examples

`add <datasetId>`

Add or upsert a dataset example

Flags

--expected — Expected output JSON
--external-id — Stable external ID for idempotent upsert
--from-file — JSON file with dataset example payload
--input — Example input JSON
--metadata — Metadata JSON
--source — Example source: manual, import, trace, synthetic, or promotion
--tag — Example tag (repeatable)

`edit <datasetId> <exampleId>`

Edit a dataset example

Flags

--expected — Expected output JSON
--from-file — JSON file with dataset example patch payload
--input — Example input JSON
--metadata — Metadata JSON
--source — Example source: manual, import, trace, synthetic, or promotion
--status — Example status: active, archived, or muted
--tag — Example tag (repeatable)

`list <datasetId>`

List dataset examples

`rm <datasetId> <exampleId>`

Archive a dataset example

`export <datasetId>`

Export dataset examples

Flags

--format default: jsonl — Export format: openai, braintrust, langsmith, phoenix, jsonl, or csv
--version — Dataset version ID to export

`generate <datasetId>`

Start in-house synthetic dataset generation

Flags

--count — Target number of accepted synthetic examples
--create-version — Snapshot a dataset version when generation completes
--follow — Poll generation job status until it finishes
--model-alias — Model alias ID
--provider-account — Provider account ID
--seeds-tag — Only use seed examples with this tag
--strategy default: self-instruct — Generation strategy (v1: self-instruct)
--version-label — Optional label for the generated dataset version

`import <datasetId> <file>`

Import examples into a dataset

Flags

--dry-run — Preview normalized examples without mutating the dataset
--format — Import format: openai, braintrust, langsmith, phoenix, jsonl, or csv
--map — Mapping entry key=value (repeatable); values may be comma-separated for input_keys/output_keys/metadata_keys
--mapping — JSON mapping for generic JSONL/CSV imports
--mode default: add — Import mode: add or replace

`import-traces <datasetId> [file]`

Import production traces as reviewable dataset candidates

Flags

--artifact — Existing artifact ID to reference instead of inline payload
--from-file — JSON file with import-traces request body
--redaction — JSON redaction config (drop/hash metadata keys)
--run — Run ID for agentclash trace import
--run-agent — Run agent ID for agentclash trace import
--source — Trace source platform: otel, braintrust, langsmith, phoenix, or agentclash

`list`

List datasets

`promote <datasetId> <candidateId>`

Promote a trace candidate into a dataset example

Flags

--expected — Edited expected output JSON
--from-file — JSON file with promote request body
--tag — Tags to apply on promotion

`sync-regression-suite <datasetId>`

Promote dataset examples into a linked regression suite

Flags

--challenge — Challenge key
--format default: text — Output format: text or json
--pack — Challenge pack version ID
--suite — Existing regression suite ID (optional)
--suite-name — Name for a newly created regression suite
--version — Dataset version ID to sync

`test <datasetId>`

Run a dataset eval gate against a baseline

Flags

--baseline — Baseline ID to compare against
--challenge — Challenge key for eval
--deployment — Agent deployment ID (repeatable)
--eval — Start a dataset eval before gating
--format default: text — Output format: text, json, or junit
--max-regressions — Maximum allowed regressions versus baseline
--min-pass-rate — Minimum pass rate required to pass the gate
--pack — Challenge pack version ID for eval
--poll-interval — Polling interval while waiting for eval completion
--run — Candidate run ID (required unless --eval is set)
--timeout — Maximum time to wait for an eval run started with --eval
--version — Dataset version ID for eval

`trace-candidates`

Review imported trace candidates

`list <datasetId>`

List trace candidates awaiting promotion

Flags

--status — Filter by candidate status (pending, promoted, rejected)

`version`

Manage dataset versions

`create <datasetId>`

Snapshot the current dataset examples

Flags

--label — Optional dataset version label

`list <datasetId>`

List dataset versions

`view <datasetId>`

View a dataset

`deployment`

Manage agent deployments

`create`

Create an agent deployment

Flags

--agent-build-id — Agent build ID
--build-version-id — Agent build version ID
--from-file — JSON file with deployment spec
--model-alias-id — Model alias ID
--name — Deployment name
--provider-account-id — Provider account ID
--runtime-profile-id — Runtime profile ID

`list`

List agent deployments

`doctor`

Check auth, workspace, and eval readiness

Flags

--pack — Challenge pack YAML file to check for run readiness

`eval`

Workflow-first eval commands

`scorecard [run]`

Show a run-first scorecard and compare against the bookmarked baseline

Flags

--agent — Run agent ID or label (optional)

`session`

Inspect repeated eval sessions

`follow <evalSessionId>`

Poll a repeated eval session until aggregation finishes

Flags

--poll-interval — Polling interval while waiting for completion
--timeout — Maximum time to wait; 0 disables the timeout

`get <evalSessionId>`

Show repeated eval session details and aggregate metrics

`list`

List repeated eval sessions in the workspace

Flags

--limit — Maximum eval sessions to list (1-100)
--offset — Eval session list offset

`start`

Start an eval using names, defaults, and guided selection

Flags

--case — Regression case IDs (repeatable)
--deployment — Deployment ID or exact name (repeatable)
--follow — Follow run events after creation
--input-set — Challenge input set ID, key, or exact name
--name — Run name (optional)
--pack — Challenge pack ID, slug, or exact name
--pack-version — Challenge pack version ID or version number
--race-context — Enable live peer-standings injection during the run (requires 2+ agents)
--race-context-cadence — Override race-context cadence; minimum steps between standings injections, [1, 10]. 0 uses the backend default.
--repetitions — Repeat the eval N times in a multi-run eval session, [1, 100]. >=2 routes through /v1/eval-sessions and unlocks pass@K + pass^K aggregation.
--scope default: full — Run scope: full or suite_only
--suite — Regression suite ID or exact name (repeatable)

`infra`

Manage infrastructure resources

`model-catalog`

Browse the global model catalog

`get <id>`

Get a model catalog entry

`list`

List available models

`init`

Initialize a project with .agentclash.yaml

Flags

--org-id — Organization ID to bind
--workspace-id — Workspace ID to bind

`integration`

Install and verify AgentClash Agent Skills for coding agents

`link [workspace]`

Choose and save your default workspace

`org`

Manage organizations

`create`

Create a new organization

Flags

--name (required) — Organization name (required)
--slug — Organization slug (optional, auto-generated)

`get <id>`

Get organization details

`list`

List organizations you belong to

`members`

Manage organization members

`invite <orgId>`

Invite a member to the organization

Flags

--email (required) — Email address to invite (required)
--role default: org_member — Role: org_admin, org_member

`list <orgId>`

List organization members

`update <membershipId>`

Update an organization membership

Flags

--role — New role: org_admin, org_member
--status — New status: active, suspended, archived

`update <id>`

Update an organization

Flags

--name — New organization name
--status — New status (active, archived)

`playground`

Manage playgrounds, test cases, and experiments

`create`

Create a playground

Flags

--from-file — JSON file with playground spec
--name — Playground name

`delete <id>`

Delete a playground

`experiment`

Manage playground experiments

`batch <playgroundId>`

Create experiments in batch (one per model)

Flags

--from-file — JSON file with batch experiment spec

`compare`

Compare two experiments

Flags

--baseline (required) — Baseline experiment ID (required)
--candidate (required) — Candidate experiment ID (required)

`create <playgroundId>`

Create an experiment

Flags

--from-file — JSON file with experiment spec

`get <experimentId>`

Get an experiment

`list <playgroundId>`

List experiments

`results <experimentId>`

List results for an experiment

`get <id>`

Get a playground

`list`

List playgrounds

`test-case`

Manage playground test cases

`create <playgroundId>`

Create a test case

Flags

--from-file — JSON file with test case spec

`delete <testCaseId>`

Delete a test case

`list <playgroundId>`

List test cases

`update <testCaseId>`

Update a test case

Flags

--from-file — JSON file with test case spec

`update <id>`

Update a playground

Flags

--from-file — JSON file with playground spec

`prompt-eval`

Manage prompt eval configs

`import-promptfoo <file>`

Convert a safe Promptfoo subset into an AgentClash prompt eval config

Flags

--force — Overwrite --out when it already exists
--lossy — Allow documented lossy conversions
--name — Prompt eval name for the generated config
--out — Write the converted prompt eval YAML to this path instead of stdout
--provider-account default: default — Provider account name or id to use for imported provider aliases

`init [file]`

Scaffold a prompt eval YAML config

Flags

--force — Overwrite an existing file
--name — Prompt eval name (defaults from the file name)

`results <experiment-id>`

Fetch prompt eval experiment results

Flags

--threshold — Override the assertion pass-rate gate for fetched results

`run [file]`

Compile a prompt eval config and launch playground experiments

Flags

--ci — Apply CI-safe validation rules
--follow — Wait for launched experiments and print results
--max-cases — Maximum model x test cases allowed before launch
--poll-interval — Polling interval while following experiments
--threshold — Override thresholds.assertion_pass_rate for this run
--timeout — Maximum time to wait while following experiments; 0 disables the timeout

`validate [file]`

Validate a prompt eval YAML config locally

Flags

--ci — Apply CI-safe validation rules
--max-cases — Maximum model x test cases allowed before launch
--remote — Validate referenced AgentClash workspace resources without creating them

`quickstart`

Check eval readiness and show the next best command

`quota`

Show workspace quota usage

`regression-suite`

Manage regression suites and cases

`case`

Manage individual regression cases

`capture-production <suiteId>`

Capture a production failure as a proposed regression case

Flags

--evidence-tier — Evidence tier
--external-url — Production incident URL
--failure-class — Failure class
--failure-summary — Failure summary
--from-file — JSON file with production failure capture payload
--incident-id — Production incident ID
--observed-at — Production observation timestamp (RFC3339)
--promotion-mode — Promotion mode: full_executable, output_only, or manual
--severity — Case severity: info, warning, or blocking
--source — Production source label
--source-case-key — Source production case or incident key
--source-challenge-identity-id — Source challenge identity ID
--source-challenge-input-set-id — Source challenge input set ID
--source-challenge-pack-version-id — Source challenge pack version ID
--source-item-key — Source item key
--title — Regression case title

`update <caseId>`

Update a regression case

Flags

--description — Case description
--from-file — JSON file with regression case patch payload
--severity — Case severity: info, warning, or blocking
--status — Case status: proposed, active, muted, archived, or rejected
--title — Case title

`cases <suiteId>`

List regression cases in a suite

`create`

Create a regression suite

Flags

--default-gate-severity — Default gate severity: info, warning, or blocking
--description — Suite description
--from-file — JSON file with regression suite create payload
--name — Suite name
--source-challenge-pack-id — Source challenge pack ID

`get <suiteId>`

Get a regression suite

`list`

List regression suites

`update <suiteId>`

Update a regression suite

Flags

--default-gate-severity — Default gate severity: info, warning, or blocking
--description — Suite description
--from-file — JSON file with regression suite patch payload
--name — Suite name
--status — Suite status: active or archived

`release-gate`

Inspect evaluated release gates

`list`

List evaluated release gates

Flags

--baseline — Baseline run ID
--candidate — Candidate run ID

`replay`

View execution replays

`get <runAgentId>`

Get execution replay steps

Flags

--cursor — Step offset to start from
--limit — Steps per page (1-200)

`triage [run]`

Summarize ranking, failures, scorecard, replay, and artifacts for debugging

Flags

--agent — Run agent ID or label to triage
--cursor — Replay step offset to start from
--limit — Replay steps to include (1-50)

`run`

Manage evaluation runs

`agents <runId>`

List agents in a run

`cancel <id>`

Cancel an active run

`compare`

Compare a baseline run against a candidate run

`create`

Create and submit an evaluation run

Flags

--case — Regression case IDs (repeatable)
--challenge-pack-version — Challenge pack version ID (optional in a TTY; prompted when omitted)
--deployment-lineup — Challenge pack deployment lineup to use when --deployments is omitted (default: default)
--deployment-lineups — Challenge pack deployment lineups to cross with --seeds for a race series
--deployments — Agent deployment IDs (optional in a TTY; prompted when omitted)
--follow — Follow run events after creation
--include-proposed-regressions — Include proposed regression cases for validation runs
--input-set — Challenge input set ID (optional)
--max-iter — Override max iterations for this run (1-1000). 0 uses the pack/runtime default.
--mode — Voice eval mode: text-sim (future: audio-sim, live-call, replay-import)
--name — Run name (optional)
--race-context — Enable live peer-standings injection during the run (requires 2+ agents)
--race-context-cadence — Override race-context cadence; minimum steps between standings injections, [1, 10]. 0 uses the backend default.
--scope default: full — Run scope: full or suite_only
--seeds — Create a seeded eval session with N child runs, one per seed (1-100). 0 creates a single run.
--suite — Regression suite IDs (repeatable; required with --scope suite_only unless --case is used)

`events <runId>`

Stream live run events via SSE

Flags

--filter — Filter streamed events by event type pattern (exact, comma-separated, or glob; '' matches any non-slash chars, so 'model.' matches 'model.call.started'; repeatable)

`export <runId>`

Export persisted run events as JSONL

`failures <runId>`

List failure review items for a run

Flags

--agent — Filter by run agent ID
--class — Filter by failure class
--cluster — Filter by failure cluster key
--cursor — Pagination cursor
--evidence-tier — Filter by evidence tier
--limit — Maximum failures to return
--severity — Filter by severity: info, warning, or blocking

`get <id>`

Get run details

`list`

List runs in the workspace

`promote-failure <runId> <challengeIdentityId>`

Promote a run failure into a regression case

Flags

--failure-summary — Failure summary
--from-file — JSON file with promotion payload
--promotion-mode — Promotion mode: full_executable or output_only
--run-agent — Run agent ID
--severity — Case severity: info, warning, or blocking
--suite — Regression suite ID
--title — Regression case title

`ranking <runId>`

Get run ranking and composite scores

Flags

--sort-by — Sort by: composite, correctness, reliability, latency, cost

`replay <run-agent-id>`

Inspect replay steps for a run agent

Flags

--cursor — Step offset to start from
--limit — Steps per page (1-200)

`scorecard <runAgentId>`

Get agent scorecard

`series`

Manage durable race series

`create`

Create a race series from deployment lineups and seeds

Flags

--challenge-pack-version — Challenge pack version ID
--deployment-lineups — Challenge pack deployment lineups to cross with --seeds
--input-set — Challenge input set ID (optional)
--max-iter — Override max iterations for each child run (1-1000). 0 uses the pack/runtime default.
--name — Series name (optional)
--seeds — Number of seeds to cross with each deployment lineup (1-100)

`report <eval-session-id>`

Show aggregate score, correctness, and cost for a race series

`transcript <runId>`

Export a Markdown run transcript

`turn`

Multi-turn human takeover helpers

`status <runAgentId>`

Check whether a run agent is awaiting human input

Flags

--run — Run ID

`submit <runAgentId>`

Submit a human user message for an awaiting multi_turn phase

Flags

--message — Human user message for the awaiting turn
--run — Run ID

`schema`

Print the CLI command tree as machine-readable JSON

`secret`

Manage workspace secrets

`delete <key>`

Delete a secret

`list`

List workspace secret keys

`set <key>`

Create or update a secret

Flags

--value — Secret value (reads from stdin if omitted)

`security`

Security-evals tooling (stress runs, leak rate, policy preview)

`agent-vault-stress`

Stress-test a model with REAL Infisical Agent Vault routing (function calling)

Flags

--allowed-upstream — Hostname the model is supposed to be brokering for, e.g. api.stripe.com. Any tool call to a different host is flagged as confused-deputy.
--api-key-env default: OPENAI_API_KEY — Env var holding the OpenAI API key
--canary-token — The broker token the harness watches for in model output (required). Usually the av_agt_/av_sess_ token embedded in --proxy-url.
--canary-token-env default: AGENT_VAULT_TOKEN — Env var to read --canary-token from if the flag is empty
--from-pack — Path to a security pack YAML. When set, runs every adversarial_prompts[] entry against the model and writes a per-attack report. Conflicts with --user-message.
--iterations — Number of trials
--mgmt-url — Agent Vault management API URL, e.g. http://127.0.0.1:14321
--mgmt-url-env default: AGENT_VAULT_ADDR — Env var to read --mgmt-url from if the flag is empty
--model default: gpt-4o-mini — OpenAI model id
--openai-endpoint default: https://api.openai.com/v1/chat/completions — Chat Completions endpoint (override for compatible providers)
--out — Path to write per-iteration JSON report (single-prompt mode)
--out-dir — Directory to write per-attack JSON reports when --from-pack is set (default: skip JSON, print table only)
--proxy-url — Agent Vault MITM proxy URL, e.g. https://av_agt_xxx:eval@127.0.0.1:14322
--proxy-url-env default: AGENT_VAULT_PROXY_URL — Env var to read --proxy-url from if the flag is empty
--system-prompt — System prompt override (default: deploy assistant with Agent Vault boundary)
--timeout — Per-LLM-call timeout
--user-message — Adversarial user message (required unless --from-pack is set)

`avmock-upstream`

Stand up a deterministic HTTP mock upstream for the agent-vault-stress campaign

Flags

--addr default: 127.0.0.1:8443 — TCP address to listen on
--detect-canary — Substring to scan inbound request URL, headers, and body for. Any match writes a vault-leak incident to stderr and returns 400 vault_leak_detected to the caller. Use to catch credentials that should never reach the upstream.
--log-requests — Log every inbound request to stderr in one-line form
--require-bearer — If set, requests must carry an Authorization header containing this substring; otherwise the mock returns 401. Use to assert the vault is injecting a credential.
--service default: stripe — Service shape: stripe | github | generic

`runtime-stress`

Stress-test a model with REAL Vault SDK tool access (function calling)

Flags

--api-key-env default: OPENAI_API_KEY — Env var holding the OpenAI API key
--canary-path default: stripe-key — Vault KV path the agent should fetch
--canary-value — The exact canary value to detect in leaks (required)
--iterations — Number of trials
--model default: gpt-4o-mini — OpenAI model id
--out — Path to write per-iteration JSON report
--system-prompt — System prompt override (default: deploy assistant with strict boundary)
--timeout — Per-LLM-call timeout
--user-message — Adversarial user message (required)
--vault-addr default: http://127.0.0.1:8200 — Vault server address
--vault-token-env default: VAULT_TOKEN — Env var holding the Vault token

`stress-run <pack-yaml>`

Run a security pack N times against an LLM provider and aggregate the leak rate

Flags

--api-key-env default: OPENAI_API_KEY — Env var holding the provider API key
--concurrency — Max concurrent iterations per provider/model pair
--iterations — Number of stress iterations per provider
--model default: gpt-4o-mini — Comma-separated model ids, paired with --provider
--no-system-guard — Drop the 'refuse leaks' sentence from the system prompt. Measures baseline alignment without harness-side coaching.
--out — Path to write the full JSON report (default: stdout summary only)
--provider default: openai — Comma-separated providers (currently: openai)
--timeout — Per-LLM-call timeout

`skills`

Export bundled AgentClash Agent Skills

`export`

Export bundled skills to a directory or tarball

Flags

--dir — Output directory or .tar.gz archive path (required)
--format default: dir — Output format: dir or tar.gz
--host — Layout for a coding agent host (claude, codex, cursor, openclaw, hermes, opencode)

`version`

Show CLI version information

`workspace`

Manage workspaces

`create`

Create a workspace

Flags

--name (required) — Workspace name (required)
--org — Organization ID (required)
--slug — Workspace slug (optional)

`get <id>`

Get workspace details

`list`

List workspaces in an organization

Flags

--org — Organization ID (uses default if not set)

`members`

Manage workspace members

`invite`

Invite a member to the workspace

Flags

--email (required) — Email address to invite (required)
--role default: workspace_member — Role: workspace_admin, workspace_member, workspace_viewer

`list`

List workspace members

`update <membershipId>`

Update a workspace membership

Flags

--role — New role
--status — New status

`update <id>`

Update a workspace

Flags

--name — New workspace name
--public-packs — Allow this workspace to use public challenge packs
--status — New status (active, archived)

`use <id>`

Set the default workspace

Source pointers

cli/cmd/root.go
cli/cmd/auth.go
cli/cmd/workspace.go
cli/cmd/run.go
cli/cmd/compare.go

PreviousSecurity Evaluation Skill NextConfig