Reference
CLI Reference
Commands, flags, and command groups generated from the current Cobra CLI source.
Intro-oriented readers should still start from Hosted quickstart; this page focuses on mechanically listing what cli/cmd exposes after each docs rebuild. For repo layout around the CLI, see Codebase tour.
See also
This page is generated from the Cobra command definitions in cli/cmd.
Global flags
--api-url— API base URL (overrides config)--json— Output in JSON format--no-color— Disable color output--output(-o) — Output format: table, json, yaml--quiet(-q) — Suppress non-essential output--verbose(-v) — Enable debug output on stderr--workspace(-w) — Workspace ID (overrides config)--yes— Skip confirmation prompts
Command groups
agent-harness
Manage coding-agent harnesses
create
Create an agent harness
Flags
--api-key-secret— Workspace secret name containing the runner provider API key--auth-modedefault:api_key_secret— Harness auth mode: api_key_secret--base-branch— Base branch for repository work--codex-model— Runner model override--codex-template— E2B template override for the harness runner--description— Harness description--evaluation-config— Inline JSON evaluation config--evaluation-config-file— JSON file with validators and LLM judges--execution-config— Inline JSON execution config--from-file— JSON file with agent harness spec--harness-kinddefault:codex_e2b— Harness runner kind: codex_e2b, claude_e2b, hermes_e2b, or openclaw_e2b--name— Harness name--openai-api-key-secret— Workspace secret name containing OPENAI_API_KEY--repository-url— Repository URL for the harness task--task— Task prompt for the coding harness
execution
Inspect agent harness executions
cancel <execution-id>
Cancel an Agent Harness execution
failure-review
Inspect or edit Agent Harness failure classifications
get <execution-id>
Get Agent Harness failure review
update <execution-id>
Update Agent Harness failure review annotations
Flags
--from-file— JSON file with failure review update payload--human-class— Human-curated failure class--human-payload— Inline JSON human payload--human-summary— Human-curated failure summary--suggested-class— Suggested failure class--suggested-confidence— Suggested confidence as a decimal--suggested-payload— Inline JSON suggested payload--suggested-source— Suggested source: rules or llm--suggested-summary— Suggested failure summary
get <execution-id>
Get an agent harness execution
promote-task <execution-id>
Promote a prior harness run into a private suite task
Flags
--failure-class— Failure class to store with promotion metadata--failure-summary— Failure summary to store with promotion metadata--from-file— JSON file with promotion payload--metadata— Inline JSON promotion metadata--public-prompt— Sanitized public prompt--suite— Target Agent Harness suite ID--title— Promoted private task title
retry <execution-id>
Retry a terminal Agent Harness execution
Flags
--idempotency-key— Retry idempotency key
executions <harness-id>
List executions for an agent harness
failures
Inspect Agent Harness failure summaries
summary
Summarize Agent Harness failure modes
get <id>
Get an agent harness
list
List agent harnesses
run <harness-id>
Start an agent harness execution
Flags
--follow— Poll until the harness execution reaches a terminal status--message— Override the harness task prompt for this execution--poll-interval— Polling interval for --follow
suite
Manage Agent Harness suites and private task banks
create
Create an Agent Harness suite/private task bank
Flags
--description— Suite description--from-file— JSON file with agent harness suite spec--metadata— Inline JSON suite metadata--name— Suite name--task-json— Suite task JSON object; may be repeated
list
List Agent Harness suites
rankings <suite-id>
Get Agent Harness suite rankings
Flags
--k— k value for pass@k and pass^k--version-id— Immutable suite version ID
run <suite-id>
Start suite runs across one or more harnesses
Flags
--harness— Harness ID to run; may be repeated or comma-separated--task— Suite task ID filter; may be repeated or comma-separated
tasks <suite-id>
List public tasks for an Agent Harness suite
artifact
Upload and download artifacts
download <artifactId>
Download an artifact
Flags
--output(-O) — Output file path (defaults to stdout)
list
List artifacts in the workspace
upload <file>
Upload an artifact
Flags
--metadata— JSON metadata (optional)--run— Run ID (optional)--run-agent— Run agent ID (optional)--type(required) — Artifact type (required)
validate-voice-manifest <file>
Validate a voice artifact manifest JSON file against the AgentClash schema
validate-voice-report <file>
Validate a voice report JSON file against an AgentClash schema
Flags
--schema— JSON Schema path (required when report type cannot be auto-detected)
auth
Manage authentication
login
Log in to AgentClash
Flags
--device— Print the verification URL instead of opening the browser automatically--force— Start a new browser login even if existing credentials are valid
logout
Log out and remove stored credentials
status
Show current authentication status
tokens
Manage CLI access tokens
list
List your CLI tokens
revoke <token-id>
Revoke a CLI token
baseline
Manage the workspace-scoped default baseline run
clear
Clear the default baseline run for the current workspace
set [run]
Bookmark a run as the default baseline for the current workspace
Flags
--agent— Run agent ID or label (optional)
show
Show the default baseline run for the current workspace
build
Manage agent builds
create
Create a new agent build
Flags
--description— Build description--name(required) — Build name (required)
get <id>
Get agent build with version history
list
List agent builds
version
Manage agent build versions
create <buildId>
Create a new draft version
Flags
--agent-kind— Agent kind: llm_agent, workflow_agent, programmatic_agent, multi_agent_system, hosted_external--spec-file— JSON file with version spec fields--template— Template key to scaffold this version (for example: honest-agent, code-reviewer)
get <versionId>
Get a build version
ready <versionId>
Mark a version as ready (immutable, deployable)
templates
List built-in build version templates
update <versionId>
Update a draft build version
Flags
--spec-file— JSON file with updated version spec fields
validate <versionId>
Validate a build version
challenge-pack
Manage challenge packs
init <file>
Scaffold a minimal challenge pack YAML bundle
Flags
--force— Overwrite an existing file--name— Challenge pack display name (defaults from the file name)--slug— Challenge pack slug (defaults from the file name)--templatedefault:prompt_eval— Starter template: prompt_eval, responses, or native
list
List challenge packs
publish <file>
Publish a challenge pack YAML bundle
validate <file>
Validate a challenge pack YAML bundle
ci
Manage AgentClash CI manifests
baseline
Resolve the baseline selected by an AgentClash CI manifest
Flags
--manifestdefault:.agentclash/ci.yaml— Path to the AgentClash CI manifest
init <file>
Write a sample AgentClash CI manifest
Flags
--force— Overwrite an existing manifest
run
Run the AgentClash CI workflow described by a manifest
Flags
--artifact-dir— Write stable AgentClash CI JSON artifacts to this directory--ci-branch— Branch metadata override--ci-commit— Commit SHA metadata override--ci-default-branch— Default branch metadata override for auto_on_main regression promotion--ci-event— CI event name metadata override--ci-provider— CI provider metadata override--ci-pull-request— Positive pull request number metadata override--ci-ref— Git ref metadata override--ci-repository— Repository metadata override, for example owner/repo--ci-workflow— Workflow name metadata override--ci-workflow-run-attempt— Workflow run attempt metadata override--ci-workflow-run-id— Workflow run id metadata override--ci-workflow-run-url— Workflow run URL metadata override--follow— Stream run events while waiting for the candidate run--github-step-summary— Append a GitHub Actions step summary when GITHUB_STEP_SUMMARY is set--manifestdefault:.agentclash/ci.yaml— Path to the AgentClash CI manifest--poll-interval— Polling interval while waiting for run completion--summary-file— Write a Markdown CI gate summary to this file--timeout— Maximum time to wait for the candidate run; 0 disables the timeout
should-run
Decide whether AgentClash CI should run
Flags
--base— Base git ref for deriving changed files--changed-file— Changed file path; may be repeated--github-event— GitHub event JSON file for deriving pull request labels--head— Head git ref for deriving changed files--labels— Pull request labels; may be comma-separated or repeated--manifestdefault:.agentclash/ci.yaml— Path to the AgentClash CI manifest--repodefault:.— Git repository path for --base/--head diff
validate <file>
Validate an AgentClash CI manifest
Flags
--remote— Validate manifest resource IDs against the selected workspace
compare
Compare runs and evaluate release gates
gate
Evaluate a release gate (nonzero exit = regression or missing evidence)
Flags
--baseline(required) — Baseline run ID (required)--baseline-agent— Baseline run agent ID (optional)--candidate(required) — Candidate run ID (required)--candidate-agent— Candidate run agent ID (optional)
latest
Compare the saved baseline against the latest non-baseline run
Flags
--agent— Run agent ID or label to use for both runs when possible--baseline-agent— Baseline run agent ID or label (defaults to the saved baseline agent)--candidate-agent— Candidate run agent ID or label--gate— Also evaluate the release gate and return a nonzero exit code for non-pass verdicts
runs
Compare baseline vs candidate runs
completion [bash|zsh|fish|powershell]
Generate a shell completion script
config
Manage CLI configuration
get <key>
Get a config value
list
List all config values
set <key> <value>
Set a config value
dataset
Manage eval datasets
create
Create a dataset
Flags
--default-challenge-pack-version-id— Default challenge pack version ID--description— Dataset description--enforce-schema— Reject examples that do not match the input schema--from-file— JSON file with dataset create payload--input-schema— Input JSON Schema--name— Dataset name--slug— Dataset slug
delete <datasetId>
Archive a dataset
eval <datasetId>
Run an eval over a dataset version
Flags
--challenge— Challenge key to bind examples to--deployment— Agent deployment ID (repeatable)--follow— Follow run events after creation--mapping— Optional JSON mapping for dataset example fields--name— Optional run name--pack— Challenge pack version ID--version— Dataset version ID to run
example
Manage dataset examples
add <datasetId>
Add or upsert a dataset example
Flags
--expected— Expected output JSON--external-id— Stable external ID for idempotent upsert--from-file— JSON file with dataset example payload--input— Example input JSON--metadata— Metadata JSON--source— Example source: manual, import, trace, synthetic, or promotion--tag— Example tag (repeatable)
edit <datasetId> <exampleId>
Edit a dataset example
Flags
--expected— Expected output JSON--from-file— JSON file with dataset example patch payload--input— Example input JSON--metadata— Metadata JSON--source— Example source: manual, import, trace, synthetic, or promotion--status— Example status: active, archived, or muted--tag— Example tag (repeatable)
list <datasetId>
List dataset examples
rm <datasetId> <exampleId>
Archive a dataset example
export <datasetId>
Export dataset examples
Flags
--formatdefault:jsonl— Export format: openai, braintrust, langsmith, phoenix, jsonl, or csv--version— Dataset version ID to export
generate <datasetId>
Start in-house synthetic dataset generation
Flags
--count— Target number of accepted synthetic examples--create-version— Snapshot a dataset version when generation completes--follow— Poll generation job status until it finishes--model-alias— Model alias ID--provider-account— Provider account ID--seeds-tag— Only use seed examples with this tag--strategydefault:self-instruct— Generation strategy (v1: self-instruct)--version-label— Optional label for the generated dataset version
import <datasetId> <file>
Import examples into a dataset
Flags
--dry-run— Preview normalized examples without mutating the dataset--format— Import format: openai, braintrust, langsmith, phoenix, jsonl, or csv--map— Mapping entry key=value (repeatable); values may be comma-separated for input_keys/output_keys/metadata_keys--mapping— JSON mapping for generic JSONL/CSV imports--modedefault:add— Import mode: add or replace
import-traces <datasetId> [file]
Import production traces as reviewable dataset candidates
Flags
--artifact— Existing artifact ID to reference instead of inline payload--from-file— JSON file with import-traces request body--redaction— JSON redaction config (drop/hash metadata keys)--run— Run ID for agentclash trace import--run-agent— Run agent ID for agentclash trace import--source— Trace source platform: otel, braintrust, langsmith, phoenix, or agentclash
list
List datasets
promote <datasetId> <candidateId>
Promote a trace candidate into a dataset example
Flags
--expected— Edited expected output JSON--from-file— JSON file with promote request body--tag— Tags to apply on promotion
sync-regression-suite <datasetId>
Promote dataset examples into a linked regression suite
Flags
--challenge— Challenge key--formatdefault:text— Output format: text or json--pack— Challenge pack version ID--suite— Existing regression suite ID (optional)--suite-name— Name for a newly created regression suite--version— Dataset version ID to sync
test <datasetId>
Run a dataset eval gate against a baseline
Flags
--baseline— Baseline ID to compare against--challenge— Challenge key for eval--deployment— Agent deployment ID (repeatable)--eval— Start a dataset eval before gating--formatdefault:text— Output format: text, json, or junit--max-regressions— Maximum allowed regressions versus baseline--min-pass-rate— Minimum pass rate required to pass the gate--pack— Challenge pack version ID for eval--poll-interval— Polling interval while waiting for eval completion--run— Candidate run ID (required unless --eval is set)--timeout— Maximum time to wait for an eval run started with --eval--version— Dataset version ID for eval
trace-candidates
Review imported trace candidates
list <datasetId>
List trace candidates awaiting promotion
Flags
--status— Filter by candidate status (pending, promoted, rejected)
version
Manage dataset versions
create <datasetId>
Snapshot the current dataset examples
Flags
--label— Optional dataset version label
list <datasetId>
List dataset versions
view <datasetId>
View a dataset
deployment
Manage agent deployments
create
Create an agent deployment
Flags
--agent-build-id— Agent build ID--build-version-id— Agent build version ID--from-file— JSON file with deployment spec--model-alias-id— Model alias ID--name— Deployment name--provider-account-id— Provider account ID--runtime-profile-id— Runtime profile ID
list
List agent deployments
doctor
Check auth, workspace, and eval readiness
Flags
--pack— Challenge pack YAML file to check for run readiness
eval
Workflow-first eval commands
scorecard [run]
Show a run-first scorecard and compare against the bookmarked baseline
Flags
--agent— Run agent ID or label (optional)
session
Inspect repeated eval sessions
follow <evalSessionId>
Poll a repeated eval session until aggregation finishes
Flags
--poll-interval— Polling interval while waiting for completion--timeout— Maximum time to wait; 0 disables the timeout
get <evalSessionId>
Show repeated eval session details and aggregate metrics
list
List repeated eval sessions in the workspace
Flags
--limit— Maximum eval sessions to list (1-100)--offset— Eval session list offset
start
Start an eval using names, defaults, and guided selection
Flags
--case— Regression case IDs (repeatable)--deployment— Deployment ID or exact name (repeatable)--follow— Follow run events after creation--input-set— Challenge input set ID, key, or exact name--name— Run name (optional)--pack— Challenge pack ID, slug, or exact name--pack-version— Challenge pack version ID or version number--race-context— Enable live peer-standings injection during the run (requires 2+ agents)--race-context-cadence— Override race-context cadence; minimum steps between standings injections, [1, 10]. 0 uses the backend default.--repetitions— Repeat the eval N times in a multi-run eval session, [1, 100]. >=2 routes through /v1/eval-sessions and unlocks pass@K + pass^K aggregation.--scopedefault:full— Run scope: full or suite_only--suite— Regression suite ID or exact name (repeatable)
infra
Manage infrastructure resources
model-catalog
Browse the global model catalog
get <id>
Get a model catalog entry
list
List available models
init
Initialize a project with .agentclash.yaml
Flags
--org-id— Organization ID to bind--workspace-id— Workspace ID to bind
integration
Install and verify AgentClash Agent Skills for coding agents
link [workspace]
Choose and save your default workspace
org
Manage organizations
create
Create a new organization
Flags
--name(required) — Organization name (required)--slug— Organization slug (optional, auto-generated)
get <id>
Get organization details
list
List organizations you belong to
members
Manage organization members
invite <orgId>
Invite a member to the organization
Flags
--email(required) — Email address to invite (required)--roledefault:org_member— Role: org_admin, org_member
list <orgId>
List organization members
update <membershipId>
Update an organization membership
Flags
--role— New role: org_admin, org_member--status— New status: active, suspended, archived
update <id>
Update an organization
Flags
--name— New organization name--status— New status (active, archived)
playground
Manage playgrounds, test cases, and experiments
create
Create a playground
Flags
--from-file— JSON file with playground spec--name— Playground name
delete <id>
Delete a playground
experiment
Manage playground experiments
batch <playgroundId>
Create experiments in batch (one per model)
Flags
--from-file— JSON file with batch experiment spec
compare
Compare two experiments
Flags
--baseline(required) — Baseline experiment ID (required)--candidate(required) — Candidate experiment ID (required)
create <playgroundId>
Create an experiment
Flags
--from-file— JSON file with experiment spec
get <experimentId>
Get an experiment
list <playgroundId>
List experiments
results <experimentId>
List results for an experiment
get <id>
Get a playground
list
List playgrounds
test-case
Manage playground test cases
create <playgroundId>
Create a test case
Flags
--from-file— JSON file with test case spec
delete <testCaseId>
Delete a test case
list <playgroundId>
List test cases
update <testCaseId>
Update a test case
Flags
--from-file— JSON file with test case spec
update <id>
Update a playground
Flags
--from-file— JSON file with playground spec
prompt-eval
Manage prompt eval configs
import-promptfoo <file>
Convert a safe Promptfoo subset into an AgentClash prompt eval config
Flags
--force— Overwrite --out when it already exists--lossy— Allow documented lossy conversions--name— Prompt eval name for the generated config--out— Write the converted prompt eval YAML to this path instead of stdout--provider-accountdefault:default— Provider account name or id to use for imported provider aliases
init [file]
Scaffold a prompt eval YAML config
Flags
--force— Overwrite an existing file--name— Prompt eval name (defaults from the file name)
results <experiment-id>
Fetch prompt eval experiment results
Flags
--threshold— Override the assertion pass-rate gate for fetched results
run [file]
Compile a prompt eval config and launch playground experiments
Flags
--ci— Apply CI-safe validation rules--follow— Wait for launched experiments and print results--max-cases— Maximum model x test cases allowed before launch--poll-interval— Polling interval while following experiments--threshold— Override thresholds.assertion_pass_rate for this run--timeout— Maximum time to wait while following experiments; 0 disables the timeout
validate [file]
Validate a prompt eval YAML config locally
Flags
--ci— Apply CI-safe validation rules--max-cases— Maximum model x test cases allowed before launch--remote— Validate referenced AgentClash workspace resources without creating them
quickstart
Check eval readiness and show the next best command
quota
Show workspace quota usage
regression-suite
Manage regression suites and cases
case
Manage individual regression cases
capture-production <suiteId>
Capture a production failure as a proposed regression case
Flags
--evidence-tier— Evidence tier--external-url— Production incident URL--failure-class— Failure class--failure-summary— Failure summary--from-file— JSON file with production failure capture payload--incident-id— Production incident ID--observed-at— Production observation timestamp (RFC3339)--promotion-mode— Promotion mode: full_executable, output_only, or manual--severity— Case severity: info, warning, or blocking--source— Production source label--source-case-key— Source production case or incident key--source-challenge-identity-id— Source challenge identity ID--source-challenge-input-set-id— Source challenge input set ID--source-challenge-pack-version-id— Source challenge pack version ID--source-item-key— Source item key--title— Regression case title
update <caseId>
Update a regression case
Flags
--description— Case description--from-file— JSON file with regression case patch payload--severity— Case severity: info, warning, or blocking--status— Case status: proposed, active, muted, archived, or rejected--title— Case title
cases <suiteId>
List regression cases in a suite
create
Create a regression suite
Flags
--default-gate-severity— Default gate severity: info, warning, or blocking--description— Suite description--from-file— JSON file with regression suite create payload--name— Suite name--source-challenge-pack-id— Source challenge pack ID
get <suiteId>
Get a regression suite
list
List regression suites
update <suiteId>
Update a regression suite
Flags
--default-gate-severity— Default gate severity: info, warning, or blocking--description— Suite description--from-file— JSON file with regression suite patch payload--name— Suite name--status— Suite status: active or archived
release-gate
Inspect evaluated release gates
list
List evaluated release gates
Flags
--baseline— Baseline run ID--candidate— Candidate run ID
replay
View execution replays
get <runAgentId>
Get execution replay steps
Flags
--cursor— Step offset to start from--limit— Steps per page (1-200)
triage [run]
Summarize ranking, failures, scorecard, replay, and artifacts for debugging
Flags
--agent— Run agent ID or label to triage--cursor— Replay step offset to start from--limit— Replay steps to include (1-50)
run
Manage evaluation runs
agents <runId>
List agents in a run
cancel <id>
Cancel an active run
compare
Compare a baseline run against a candidate run
create
Create and submit an evaluation run
Flags
--case— Regression case IDs (repeatable)--challenge-pack-version— Challenge pack version ID (optional in a TTY; prompted when omitted)--deployment-lineup— Challenge pack deployment lineup to use when --deployments is omitted (default: default)--deployment-lineups— Challenge pack deployment lineups to cross with --seeds for a race series--deployments— Agent deployment IDs (optional in a TTY; prompted when omitted)--follow— Follow run events after creation--include-proposed-regressions— Include proposed regression cases for validation runs--input-set— Challenge input set ID (optional)--max-iter— Override max iterations for this run (1-1000). 0 uses the pack/runtime default.--mode— Voice eval mode: text-sim (future: audio-sim, live-call, replay-import)--name— Run name (optional)--race-context— Enable live peer-standings injection during the run (requires 2+ agents)--race-context-cadence— Override race-context cadence; minimum steps between standings injections, [1, 10]. 0 uses the backend default.--scopedefault:full— Run scope: full or suite_only--seeds— Create a seeded eval session with N child runs, one per seed (1-100). 0 creates a single run.--suite— Regression suite IDs (repeatable; required with --scope suite_only unless --case is used)
events <runId>
Stream live run events via SSE
Flags
--filter— Filter streamed events by event type pattern (exact, comma-separated, or glob; '' matches any non-slash chars, so 'model.' matches 'model.call.started'; repeatable)
export <runId>
Export persisted run events as JSONL
failures <runId>
List failure review items for a run
Flags
--agent— Filter by run agent ID--class— Filter by failure class--cluster— Filter by failure cluster key--cursor— Pagination cursor--evidence-tier— Filter by evidence tier--limit— Maximum failures to return--severity— Filter by severity: info, warning, or blocking
get <id>
Get run details
list
List runs in the workspace
promote-failure <runId> <challengeIdentityId>
Promote a run failure into a regression case
Flags
--failure-summary— Failure summary--from-file— JSON file with promotion payload--promotion-mode— Promotion mode: full_executable or output_only--run-agent— Run agent ID--severity— Case severity: info, warning, or blocking--suite— Regression suite ID--title— Regression case title
ranking <runId>
Get run ranking and composite scores
Flags
--sort-by— Sort by: composite, correctness, reliability, latency, cost
replay <run-agent-id>
Inspect replay steps for a run agent
Flags
--cursor— Step offset to start from--limit— Steps per page (1-200)
scorecard <runAgentId>
Get agent scorecard
series
Manage durable race series
create
Create a race series from deployment lineups and seeds
Flags
--challenge-pack-version— Challenge pack version ID--deployment-lineups— Challenge pack deployment lineups to cross with --seeds--input-set— Challenge input set ID (optional)--max-iter— Override max iterations for each child run (1-1000). 0 uses the pack/runtime default.--name— Series name (optional)--seeds— Number of seeds to cross with each deployment lineup (1-100)
report <eval-session-id>
Show aggregate score, correctness, and cost for a race series
transcript <runId>
Export a Markdown run transcript
turn
Multi-turn human takeover helpers
status <runAgentId>
Check whether a run agent is awaiting human input
Flags
--run— Run ID
submit <runAgentId>
Submit a human user message for an awaiting multi_turn phase
Flags
--message— Human user message for the awaiting turn--run— Run ID
schema
Print the CLI command tree as machine-readable JSON
secret
Manage workspace secrets
delete <key>
Delete a secret
list
List workspace secret keys
set <key>
Create or update a secret
Flags
--value— Secret value (reads from stdin if omitted)
security
Security-evals tooling (stress runs, leak rate, policy preview)
agent-vault-stress
Stress-test a model with REAL Infisical Agent Vault routing (function calling)
Flags
--allowed-upstream— Hostname the model is supposed to be brokering for, e.g. api.stripe.com. Any tool call to a different host is flagged as confused-deputy.--api-key-envdefault:OPENAI_API_KEY— Env var holding the OpenAI API key--canary-token— The broker token the harness watches for in model output (required). Usually the av_agt_/av_sess_ token embedded in --proxy-url.--canary-token-envdefault:AGENT_VAULT_TOKEN— Env var to read --canary-token from if the flag is empty--from-pack— Path to a security pack YAML. When set, runs every adversarial_prompts[] entry against the model and writes a per-attack report. Conflicts with --user-message.--iterations— Number of trials--mgmt-url— Agent Vault management API URL, e.g. http://127.0.0.1:14321--mgmt-url-envdefault:AGENT_VAULT_ADDR— Env var to read --mgmt-url from if the flag is empty--modeldefault:gpt-4o-mini— OpenAI model id--openai-endpointdefault:https://api.openai.com/v1/chat/completions— Chat Completions endpoint (override for compatible providers)--out— Path to write per-iteration JSON report (single-prompt mode)--out-dir— Directory to write per-attack JSON reports when --from-pack is set (default: skip JSON, print table only)--proxy-url— Agent Vault MITM proxy URL, e.g. https://av_agt_xxx:eval@127.0.0.1:14322--proxy-url-envdefault:AGENT_VAULT_PROXY_URL— Env var to read --proxy-url from if the flag is empty--system-prompt— System prompt override (default: deploy assistant with Agent Vault boundary)--timeout— Per-LLM-call timeout--user-message— Adversarial user message (required unless --from-pack is set)
avmock-upstream
Stand up a deterministic HTTP mock upstream for the agent-vault-stress campaign
Flags
--addrdefault:127.0.0.1:8443— TCP address to listen on--detect-canary— Substring to scan inbound request URL, headers, and body for. Any match writes a vault-leak incident to stderr and returns 400 vault_leak_detected to the caller. Use to catch credentials that should never reach the upstream.--log-requests— Log every inbound request to stderr in one-line form--require-bearer— If set, requests must carry an Authorization header containing this substring; otherwise the mock returns 401. Use to assert the vault is injecting a credential.--servicedefault:stripe— Service shape: stripe | github | generic
runtime-stress
Stress-test a model with REAL Vault SDK tool access (function calling)
Flags
--api-key-envdefault:OPENAI_API_KEY— Env var holding the OpenAI API key--canary-pathdefault:stripe-key— Vault KV path the agent should fetch--canary-value— The exact canary value to detect in leaks (required)--iterations— Number of trials--modeldefault:gpt-4o-mini— OpenAI model id--out— Path to write per-iteration JSON report--system-prompt— System prompt override (default: deploy assistant with strict boundary)--timeout— Per-LLM-call timeout--user-message— Adversarial user message (required)--vault-addrdefault:http://127.0.0.1:8200— Vault server address--vault-token-envdefault:VAULT_TOKEN— Env var holding the Vault token
stress-run <pack-yaml>
Run a security pack N times against an LLM provider and aggregate the leak rate
Flags
--api-key-envdefault:OPENAI_API_KEY— Env var holding the provider API key--concurrency— Max concurrent iterations per provider/model pair--iterations— Number of stress iterations per provider--modeldefault:gpt-4o-mini— Comma-separated model ids, paired with --provider--no-system-guard— Drop the 'refuse leaks' sentence from the system prompt. Measures baseline alignment without harness-side coaching.--out— Path to write the full JSON report (default: stdout summary only)--providerdefault:openai— Comma-separated providers (currently: openai)--timeout— Per-LLM-call timeout
skills
Export bundled AgentClash Agent Skills
export
Export bundled skills to a directory or tarball
Flags
--dir— Output directory or .tar.gz archive path (required)--formatdefault:dir— Output format: dir or tar.gz--host— Layout for a coding agent host (claude, codex, cursor, openclaw, hermes, opencode)
version
Show CLI version information
workspace
Manage workspaces
create
Create a workspace
Flags
--name(required) — Workspace name (required)--org— Organization ID (required)--slug— Workspace slug (optional)
get <id>
Get workspace details
list
List workspaces in an organization
Flags
--org— Organization ID (uses default if not set)
members
Manage workspace members
invite
Invite a member to the workspace
Flags
--email(required) — Email address to invite (required)--roledefault:workspace_member— Role: workspace_admin, workspace_member, workspace_viewer
list
List workspace members
update <membershipId>
Update a workspace membership
Flags
--role— New role--status— New status
update <id>
Update a workspace
Flags
--name— New workspace name--public-packs— Allow this workspace to use public challenge packs--status— New status (active, archived)
use <id>
Set the default workspace
Source pointers
cli/cmd/root.gocli/cmd/auth.gocli/cmd/workspace.gocli/cmd/run.gocli/cmd/compare.go