Benjin ZHU

Benjin ZHU 本金 朱

Ph.D Candidate

Chinese University of Hong Kong

Biography

Benjin ZHU is a final-year Ph.D candidate at the Department of Electronic Engineering, The Chinese University of Hong Kong since 2021, where he is affiliated to the MultiMedia Lab, and supervised by Prof. Hongsheng LI and Prof. Xiaogang WANG. He earned his Bachelor’s in Software Engineering from South China University of Technology in 2018.

Benjin’s current research interests include VLA, and World Models. His recent works cover 3D driving scene understanding, reconstruction, and generation. His works have been recognized at TOP conferences like CVPR/ICCV/ECCV. He has also published influential works on Object Detection and Self-Supervised Pretraining. His achievements include winning multiple TOP international competitions like the first nuScenes 3D Object Detection Challenge at WAD, CVPR 2019, where he proposed CBGS (widely adopted by both academia and industry). Benjin has also made significant contributions to open-source computer vision frameworks, including Det3D, CVPods, and EFG that garner substantial popularity.

Interests
  • Vision-Language-Action (VLA) Models
  • Diffusion Models
  • World Models \ Data-driven Driving Simulators
  • AI Infrastructure
Education
  • Ph.D. in Electronic Engineering, 2021 ~ 2025

    The Chinese Universityh of Hong Kong (CUHK)

  • B.Eng. in Software Engineering, 2014 ~ 2018

    South China University of Technology (SCUT)

News

  • 2025-06 ConsistentCity for temporally consistent 3D scene synthesis is accepted by ICCV 2025. ✨
  • 2025-05 MoviiGen-1.1, a WAN2.1-based T2V model with high cinematic aesthetics is made public.
  • 2024-07 The high-res nuCraft 3D Occupancy Dataset is accepted by ECCV 2024. ✨
  • 2023-03 EFG, an Efficient, Flexible, and General deep learning framework is public avaiable!
  • 2022-12 ConQueR is accepted by CVPR 2023, and selected as a Highlight (Top 2.5%). ✨

Work Experience

 
 
 
 
 
MEGVII Research
Researcher
January 2019 – May 2021 Beijing, China
End-to-end Object Detection, Unsupervised/Self-supervised Learning, Research Infrustrature.
 
 
 
 
 
Horizon Robotics
Perception Algorithm Engineer
April 2018 – January 2019 Beijing, China
Full-stack Point Cloud 3D Object Detection Research, Development, and Depolyment.

More Publications

(2020). EqCo: Equivalent Rules for Self-supervised Contrastive Learning. arXiv.

PDF Cite Code

(2020). AutoAssign: Differentiable Label Assignment for Dense Object Detection. arXiv.

PDF Cite Code

(2019). Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection. arXiv.

PDF Cite Code

Projects

For all projects, see here.
EFG: An Efficient, Flexible, and General deep learning framework that retains minimal.
Easy-to-use research codebase. Users can use EFG to explore any research topics following project templates.
EFG: An Efficient, Flexible, and General deep learning framework that retains minimal.
CVPods: All-in-one Toolbox for Computer Vision Research.
Welcome to cvpods, a versatile and efficient codebase for many computer vision tasks. The aim of cvpods is to achieve efficient experiments management and smooth tasks-switching.
CVPods: All-in-one Toolbox for Computer Vision Research.
Det3D: World’s First General Purpose 3D Object Detection Codebase.
Winner solution of nuScenes 3D Detection Challenge at WAD, CVPR 2019 and more.
Det3D: World's First General Purpose 3D Object Detection Codebase.