Biography

Benjin Zhu is a Research Scientist at Li Auto and a postdoctoral researcher at Tsinghua University with Prof. Jifeng DAI. He received his Ph.D. from the Department of Electronic Engineering, CUHK in 2025, advised by Prof. Hongsheng LI and Prof. Xiaogang WANG in the MMLab, and his B.Eng. from SCUT in 2018. Before his Ph.D., he was at MEGVII Research with Dr. Gang Yu, Dr. Xiangyu Zhang, and Dr. Jian Sun.

His current research focuses on cross-embodiment VLA and World Models with RL. He led the team that won the inaugural nuScenes 3D Object Detection Challenge at WAD, CVPR 2019, and authored widely-used open-source frameworks Det3D, CVPods, and EFG.

Interests
  • Embodied AI
  • Physical Intelligence
Education
  • Ph.D in Electronic Engineering, 2021 ~ 2025

    The Chinese University of Hong Kong (CUHK)

  • B.Eng in Software Engineering, 2014 ~ 2018

    South China University of Technology (SCUT)

News

  • 2026-05 Released the Mind-Omni series — our L4 autonomous driving stack covering VLA, World Models, and RL. ✨
  • 2025-06 Graduated from CUHK MMLab and joined Li Auto as a Research Scientist on L4 autonomous driving.

Work Experience

 
 
 
 
 
Li Auto
Senior Research Engineer
May 2025 – Present Beijing, China
World Models, Vision-Language-Action Models, Reinforcement Learning.
 
 
 
 
 
MEGVII Research
Researcher
January 2019 – May 2021 Beijing, China
End-to-end Object Detection, Unsupervised/Self-supervised Learning, Research Infrastructure.

More Publications

(2026). Action Emergence from Streaming Intent. arXiv.

PDF Cite Project

(2026). Driving Intents Amplify Planning-Oriented Reinforcement Learning. arXiv.

PDF Cite Project

(2020). EqCo: Equivalent Rules for Self-supervised Contrastive Learning. arXiv.

PDF Cite Code

(2019). Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection. arXiv.

PDF Cite Code

Projects

For all projects, see here.
EFG: An Efficient, Flexible, and General deep learning framework that retains minimal.
Easy-to-use research codebase. Users can use EFG to explore any research topics following project templates.
EFG: An Efficient, Flexible, and General deep learning framework that retains minimal.
CVPods: All-in-one Toolbox for Computer Vision Research.
Welcome to cvpods, a versatile and efficient codebase for many computer vision tasks. The aim of cvpods is to achieve efficient experiments management and smooth tasks-switching.
CVPods: All-in-one Toolbox for Computer Vision Research.
Det3D: World’s First General Purpose 3D Object Detection Codebase.
Winner solution of nuScenes 3D Detection Challenge at WAD, CVPR 2019 and more.
Det3D: World's First General Purpose 3D Object Detection Codebase.