Robots That Play First Solve Tasks Better: 20-Point Gains Without Extra Instructions
Self-directed robot play before task assignment builds a reusable skill library that lifts downstream performance by up to 20.6 points, no finetuning required.
Every agentic robot pipeline today assumes the same thing: skills get built when tasks arrive. The robot receives an instruction, writes a policy, observes feedback, and revises. That assumption turns out to be the bottleneck, not the foundation.
The RATs framework (Robotics Agent Teams) inserts a self-directed play stage before any downstream task arrives. During play, the agent proposes its own exploratory tasks, calibrated to be novel but learnable given current capabilities. It then plans and executes code-as-policy programs, verifies intermediate progress at each step, diagnoses failures with dense step-level feedback, retries, and distills successful executions into a persistent code skill library. Think of it as the difference between a chef who only cooks when orders come in versus one who spends mornings experimenting with techniques that later make every dish faster to execute. The library is frozen at test time. No finetuning happens downstream.
The architecture separates play into distinct agent roles: a proposer that generates candidate exploratory tasks, an executor that writes and runs robot code, a verifier that checks intermediate states, and a distiller that packages successful executions into reusable skill primitives. This division of labor matters because failure diagnosis in robot code is noisy. A single monolithic agent collapses under that noise. Splitting roles keeps each agent's context clean and its feedback signal specific.
Play-learned skills improve held-out downstream tasks by 20.6 percentage points over CaP-Agent0 on LIBERO-PRO and 17.0 points on MolmoSpaces. The skills are not locked to the training agent: plugging the frozen library into other inference-time code-as-policy agents lifts RoboSuite performance by 8.9 points and real-world transfer by 8.8 points, without touching the underlying model. For teams building general-purpose robot pipelines, the takeaway is direct: a play stage before task deployment is now a concrete architectural option, not a research abstraction.
We're thinking: We find the portability result more significant than the headline accuracy numbers. The fact that a frozen skill library, built by one agent during unsupervised play, can be dropped into a completely different inference-time agent via context retrieval and immediately improve performance suggests that the skill representation is genuinely compositional, not overfit to a single planner's quirks. That challenges a quiet assumption in the field: that agentic robot skills are tightly coupled to the model that generated them. If skills transfer across agents without finetuning, the logical next question is whether a shared, community-maintained play-derived skill library could serve as infrastructure, the way vector databases serve RAG pipelines today. That possibility is worth taking seriously now, before the field locks in per-agent, per-deployment skill silos.
Key takeaways:
- RATs separates play into proposer, executor, verifier, and distiller roles, building a frozen code skill library through self-directed exploration before any task instruction arrives.
- Play-learned skills deliver 20.6 and 17.0 percentage-point gains on LIBERO-PRO and MolmoSpaces over the no-play baseline; the library transfers to other agents with 8.9 and 8.8 point gains on RoboSuite and real-world tasks, though results are currently scoped to simulation-heavy benchmarks with structured feedback signals.
- Teams building code-as-policy robot systems should treat pre-task play as a first-class pipeline stage and evaluate whether a shared, frozen skill library can serve multiple downstream agents rather than rebuilding skills per deployment.
Source: Playful Agentic Robot Learning