DataSmith launch: weak-vs-strong synthetic data for agents
DataSmith ships as an open-source Python SDK for high-signal synthetic dataset generation with web-grounded seed construction, OTLP trace ingestion, and SFT/DPO export. AgentClash adds platform marketing, docs, and SEO surfaces for the hosted Agentic Self-Instruct loop.
- DataSmith SDK
- Synthetic data generation
- Agentic Self-Instruct
- SEO and discoverability
What shipped
Added
- DataSmith — open-source Python SDK for weak-vs-strong Agentic Self-Instruct synthetic data generation, inspired by Meta FAIR Autodata.
- /platform/datasmith landing page for the SDK + hosted generation story with FAQ and JSON-LD product schema.
- Synthetic dataset generation docs guide covering Fast Self-Instruct, Agentic Self-Instruct, CLI, and DataSmith export.
- Launch blog: Introducing DataSmith with pipeline overview, trace ingestion, and AgentClash integration table.
- SEO landing pages for synthetic data generation, Agentic Self-Instruct, trace-to-dataset workflows, and glossary definition.