AgentClash

Synthetic data for agents

High-signal synthetic data, not noisy bulk

DataSmith generates training and eval datasets with a weak-vs-strong agentic loop inspired by Meta FAIR Autodata. Use the open-source Python SDK locally, or run Agentic Self-Instruct inside AgentClash for replay, scoring, and CI gates.

agentic self-instruct
avg_gap 0.31

acceptance policy

Strong solver0.92

pass >= 0.85

Weak solver0.41

pass <= 0.55

Score gap0.51

gap >= 0.20

Judge verdictaccepted

grounded + rubric

export targets

1ShareGPT / ChatML for supervised fine-tuning
2DPO pairs when weak and strong outputs diverge
3AgentClash dataset eval and CI regression gates
pip install datasmith
datasmith run --seeds seeds.jsonl

Why weak vs strong

Examples in the useful difficulty zone

Prompt-only self-instruct often produces tasks that are too easy or too hard. The weak-vs-strong loop accepts rows only when a strong solver succeeds and a weak solver struggles: the zone where fine-tuning actually teaches something.

Web-grounded seed construction from domain briefs

Weak-vs-strong Agentic Self-Instruct judge loop

OpenTelemetry trace ingestion for seed material

Accepted and rejected JSONL with reason codes

ShareGPT, ChatML, DPO, and prompt-completion export

Provider-agnostic model interface (OpenAI-compatible or custom)

Hosted Agentic Self-Instruct inside AgentClash workspaces

Dataset baselines, eval runs, and CI regression gates

Workflow

From domain brief to trainer-ready export

Construct seeds

Start from a domain brief, web-grounded search, production traces, or source documents.

Run weak vs strong

Challenger proposes examples. Weak and strong solvers attempt them. Judge filters for the useful difficulty zone.

Export for training

Ship ShareGPT, ChatML, DPO pairs, or push accepted rows to Hugging Face Hub.

Eval and gate in AgentClash

Run hosted Agentic Self-Instruct generation, baseline the dataset, and wire CI regression gates.

FAQ

Questions teams ask about synthetic data generation

What is DataSmith?

DataSmith is an open-source Python SDK for synthetic dataset generation using a weak-vs-strong agentic loop. A challenger proposes examples, weak and strong solvers attempt them, and a judge accepts only high-signal rows ready for fine-tuning or evaluation.

How is DataSmith related to AgentClash?

DataSmith handles offline dataset creation and training export (SFT, DPO, Hugging Face). AgentClash runs the same Agentic Self-Instruct loop in hosted workspaces, then scores, replays, and gates regressions on the generated examples.

What models does DataSmith support?

Any provider that implements DataSmith's complete() protocol: OpenAI-compatible APIs, local inference, or your own wrapper. Weak and strong solvers can be different models, prompts, or compute budgets.

Can I turn production traces into training data?

Yes. DataSmith ingests OpenTelemetry JSON and span JSONL as seeds. AgentClash also supports trace import into workspace datasets for eval-oriented workflows.