researchVLA (vision-language-action transformer)Proprietary
RT-2 / RT-X
by Google DeepMind
Architecture
VLA (vision-language-action transformer)
Parameters
55B (RT-2-X)
Training Data
Open X-Embodiment dataset: 22 robot embodiments, 527 skills, 160K episodes
License
Proprietary (Open X-Embodiment dataset is open)
Status
RESEARCH
Open Source
No
Robots Supported
Apptronik Apollo (DeepMind partnership)Multiple research platforms
About
RT-2 was Google DeepMind's foundational VLA model demonstrating that large vision-language models can be directly fine-tuned to produce robot actions. RT-X extended this with cross-embodiment training across 22 robot platforms. The work fed into the Apptronik partnership announced in 2025 to build production-grade robotics foundation models on Apollo.
Key Differentiator
Cross-embodiment generalization — single model controls 22 different robots
Funding Context
Internal Alphabet investment; Apptronik partnership announced 2025
Milestones
2023-07RT-2 published — first large VLA model
2023-10Open X-Embodiment dataset released
2025-03Apptronik + Google DeepMind partnership announced