Biography

📢 Hiring highly self-motavated full-time / interns interested in VLA, World Models, and RL. (all-year-round)

Benjin ZHU is a research scientist at Li Auto. He got his Ph.D from the Department of Electronic Engineering, The Chinese University of Hong Kong in 2025. He was affiliated to the MultiMedia Lab, and supervised by Prof. Hongsheng LI and Prof. Xiaogang WANG. He earned his Bachelor’s in Software Engineering from South China University of Technology in 2018.

Benjin’s current research interests include corss-embodiment VLA, and World Models with RL. He won mutliple championships of TOP international competitions like the first nuScenes 3D Object Detection Challenge at WAD, CVPR 2019. Benjin has also made significant contributions to open-source computer vision frameworks, including Det3D, CVPods, and EFG that garner substantial popularity. Prior to his doctoral studies, Benjin worked at world-leading AI companies like MEGVII Research, where he was fortunate to collaborate with Dr. Gang Yu, Dr. Xiangyu Zhang and Dr. Jian Sun on topics like object detection and representation learning.

Interests
  • Vision-Language-Action Models
  • Diffusion Models
  • World Models
  • AI Infrastructure
Education
  • Ph.D in Electronic Engineering, 2021 ~ 2025

    The Chinese University of Hong Kong (CUHK)

  • B.Eng in Software Engineering, 2014 ~ 2018

    South China University of Technology (SCUT)

News

  • 2025-06 ConsistentCity for temporally consistent 3D scene synthesis is accepted by ICCV 2025. ✨
  • 2024-07 The high-res nuCraft 3D Occupancy Dataset is accepted by ECCV 2024. ✨
  • 2023-03 EFG, an Efficient, Flexible, and General deep learning framework is public avaiable!
  • 2022-12 ConQueR is accepted by CVPR 2023, and selected as a Highlight (Top 2.5%). ✨

Work Experience

 
 
 
 
 
Li Auto
Senior Research Engineer
May 2025 – Present Beijing, China
World Models, Vision-Language-Action Models, Reinforcement Learning.
 
 
 
 
 
MEGVII Research
Researcher
January 2019 – May 2021 Beijing, China
End-to-end Object Detection, Unsupervised/Self-supervised Learning, Research Infrustrature.

More Publications

(2020). EqCo: Equivalent Rules for Self-supervised Contrastive Learning. arXiv.

PDF Cite Code

(2020). AutoAssign: Differentiable Label Assignment for Dense Object Detection. arXiv.

PDF Cite Code

(2019). Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection. arXiv.

PDF Cite Code

Projects

For all projects, see here.
EFG: An Efficient, Flexible, and General deep learning framework that retains minimal.
Easy-to-use research codebase. Users can use EFG to explore any research topics following project templates.
EFG: An Efficient, Flexible, and General deep learning framework that retains minimal.
CVPods: All-in-one Toolbox for Computer Vision Research.
Welcome to cvpods, a versatile and efficient codebase for many computer vision tasks. The aim of cvpods is to achieve efficient experiments management and smooth tasks-switching.
CVPods: All-in-one Toolbox for Computer Vision Research.
Det3D: World’s First General Purpose 3D Object Detection Codebase.
Winner solution of nuScenes 3D Detection Challenge at WAD, CVPR 2019 and more.
Det3D: World's First General Purpose 3D Object Detection Codebase.