FIG.AIFigure 03+BMW.DeployTSLA.BOTOptimus8k.Units.Q1BOS.DYNAtlas+Hyundai.ScaleAGIL.ROBDigit+AMZN.ContractSANC.AIPhoenix+SeriesC.250M1X.TECHNEO Beta-Prod.Delay.Q2APP.TRONApollo+NASA.CollabUNIT.REEH1+China.DeployNEURA4NE-1+Series.B.80MFUND.YTD2026$4.2B.Raised

Research Hub

Key academic papers shaping the development of humanoid robots — locomotion, manipulation, sim-to-real transfer, VLA models, and tactile sensing.

VLA ModelsMar 12, 2026

Ψ₀: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation

Songlin Wei, Hongyi Jing, Boqian Li et al. · USC / Stanford / Tsinghua University

A staged training approach that sidesteps the pitfalls of directly mixing human and robot data. Ψ₀ first pre-trains a VLM backbone on 800 hours of egocentric human manipulation video, then post-trains a flow-based action expert on just 30 hours of high-quality humanoid robot data. The complete ecosystem — training pipelines, model weights, and inference engines — is fully open-sourced.

Key Finding:Outperforms baselines trained on 10× more data by >40% task success rate. Staged human-to-robot transfer is dramatically more data-efficient than joint training.
Read paper on arXiv →
LocomotionFeb 5, 2026

Scalable and General Whole-Body Control for Cross-Humanoid Locomotion (XHugWBC)

Yufei Xue, Yunfeng Lin, Wentao Dong et al. · Shanghai AI Lab / Shanghai Jiao Tong University

XHugWBC trains a single policy that generalizes whole-body locomotion and manipulation across diverse humanoid hardware — without robot-specific retraining. Key innovations include physics-consistent morphological randomization and semantically aligned observation/action spaces across architectures. Validated across 12 simulated and 7 real-world humanoid platforms.

Key Finding:100% zero-shot success rate across 7 real humanoid platforms despite large hardware variation. Accepted to ICML.
Read paper on arXiv →
VLA ModelsDec 11, 2025

WholeBodyVLA: Towards Unified Latent VLA for Whole-Body Loco-Manipulation Control

Haoran Jiang, Jin Chen, Qingwen Bu et al. · Fudan University / OpenDriveLab & MMLab @ HKU / AgiBot

A unified latent VLA framework for simultaneous locomotion and manipulation. The model learns from large quantities of action-free egocentric video paired with a loco-manipulation RL policy — dramatically reducing the cost of training data collection. Validated on the AgiBot X2 humanoid.

Key Finding:Outperforms prior baselines by 21.3% with strong generalization across a broad range of loco-manipulation tasks.
Read paper on arXiv →
VLA ModelsMar 18, 2025

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Johan Bjorck, Linxi Fan, Yuke Zhu · NVIDIA Research

GR00T N1 is a 2.2B-parameter open foundation model built on a dual-system architecture — an Eagle-2 VLM for environmental understanding and a diffusion transformer for real-time motor generation. Trained on a heterogeneous mix of real-robot trajectories, human videos, and synthetic data. Fully open-sourced on GitHub and HuggingFace.

Key Finding:Outperforms SoTA imitation learning baselines and transfers zero-shot to real Fourier GR-1 for language-conditioned bimanual manipulation.
Read paper on arXiv →
ManipulationMay 5, 2025

TWIST: Teleoperated Whole-Body Imitation System

Yanjie Ze, Zixuan Chen, João Pedro Araújo et al. · Stanford University / Simon Fraser University

TWIST retargets human motion capture data to a humanoid robot to generate reference clips, then trains a single unified whole-body controller combining RL and behavior cloning. The controller handles whole-body manipulation, legged manipulation, locomotion, and expressive movement with one network. Fully open-sourced including datasets, training code, and checkpoints.

Key Finding:A single unified controller achieves unprecedented coordinated whole-body motor skills spanning both locomotion and manipulation without task-specific controllers.
Read paper on arXiv →
Sim-to-RealFeb 27, 2025

Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids

Toru Lin, Kartik Sachdev, Linxi Fan et al. · UC Berkeley / NVIDIA / UT Austin

A practical sim-to-real RL recipe for training vision-based dexterous manipulation on humanoids with multi-fingered hands — without relying on human demonstrations. Components include automated real-to-sim tuning, contact-based reward formulation, divide-and-conquer policy distillation, and modality-specific augmentation to close the perceptual sim-to-real gap.

Key Finding:First successful sim-to-real RL transfer of vision-based dexterous manipulation to a humanoid with multi-fingered hands, achieving high success on unseen objects. Published at CoRL 2025.
Read paper on arXiv →
ManipulationJun 20, 2024

HumanPlus: Humanoid Shadowing and Imitation from Humans

Zipeng Fu, Qingqing Zhao, Qi Wu et al. · UC Berkeley

A system enabling humanoid robots to shadow and imitate human motions in real time using egocentric video, achieving robust whole-body control and skill transfer.

Key Finding:Humanoids can learn complex manipulation and locomotion skills by shadowing humans in real time with <100ms latency.
Read paper on arXiv →
VLA ModelsJun 13, 2024

OpenVLA: An Open-Source Vision-Language-Action Model

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti et al. · Stanford University

OpenVLA is a 7B-parameter open-source VLA model trained on 970k robot demonstrations, achieving state-of-the-art performance on manipulation benchmarks.

Key Finding:7B VLA models generalize to novel objects and environments with 16.5% improvement over prior SoTA.
Read paper on arXiv →
VLA ModelsOct 5, 2024

GR-2: Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation

Wenjie Zhao, Yicheng Liu, Hao Liu · ByteDance Research

GR-2 leverages internet-scale video pretraining to build a generalist manipulation policy that generalizes across robot morphologies and task types.

Key Finding:Web-scale video pretraining enables 3× improvement in zero-shot task generalization across robot morphologies.
Read paper on arXiv →
LocomotionJan 22, 2024

Extreme Parkour with Legged Robots

Ziwen Zhuang, Zipeng Fu, Jianren Wang et al. · Carnegie Mellon University

Training legged robots to perform parkour maneuvers including wall-running, gap jumping, and flipping using a hierarchical RL framework in Isaac Gym.

Key Finding:Hierarchical RL enables bipeds to learn parkour behaviors 40× faster than flat RL baselines.
Read paper on arXiv →