Hugging Face CEO Kicks Off Open Dataset Push to Close the Gap Between Open and Closed Frontier Agents

6. Hugging Face CEO Kicks Off Open Dataset Push to Close the Gap Between Open and Closed Frontier Agents

Clément Delangue, CEO of Hugging Face, announced he is publicly releasing his personal agent traces collected from Hermes, Opencode, and Claude, following a similar move by Mario Zechner (known as @badlogicgames), creator of the Pi agent, who shared his own traces on Hugging Face. The releases are framed not as finished artifacts but as a deliberate call to action: a community-driven effort to build the training data needed to develop open-source frontier-grade agents. The move is notable precisely because agent trace data, the sequential records of how an AI reasons, selects tools, and completes multi-step tasks, is among the scarcest and most strategically hoarded assets in AI development right now.

This matters because the bottleneck for capable open-source agents has never been architecture alone. It has been high-quality behavioral data showing agents doing real, complex work. Closed-lab players like Anthropic, OpenAI, and Google accumulate this data passively at scale through product usage. Open-source efforts led by Hugging Face, Mistral, and the broader community have had no equivalent flywheel. By seeding a public trace dataset with contributions from credible builders, Delangue is attempting to bootstrap exactly that flywheel. The losers in this dynamic, if it gains momentum, are companies whose competitive moats depend on keeping agentic behavioral data proprietary. The winners are smaller labs and independent developers who currently cannot afford to generate or license this kind of data at volume.

This move connects to a broader structural shift in AI competition: the frontier is moving from model weights to post-training data and agentic scaffolding. Open-weight model releases from Meta and others have eroded the weights moat considerably. The new defensible layer is behavioral data, and Delangue is signaling that the open ecosystem intends to contest that layer too, collectively rather than through any single organization.

Source: https://twitter.com/ClementDelangue/status/2041189872556269697