Benjin ZHU 本金朱

Ph.D Candidate

Chinese University of Hong Kong

Biography

Benjin ZHU is a final-year Ph.D candidate at the Department of Electronic Engineering, The Chinese University of Hong Kong since 2021, where he is affiliated to the MultiMedia Lab, and supervised by Prof. Hongsheng LI and Prof. Xiaogang WANG. He earned his Bachelor’s in Software Engineering from South China University of Technology in 2018.

Benjin’s current research interests include VLA, and World Models. His recent works cover 3D driving scene understanding, reconstruction, and generation. His works have been recognized at TOP conferences like CVPR/ICCV/ECCV. He has also published influential works on Object Detection and Self-Supervised Pretraining. His achievements include winning multiple TOP international competitions like the first nuScenes 3D Object Detection Challenge at WAD, CVPR 2019, where he proposed CBGS (widely adopted by both academia and industry). Benjin has also made significant contributions to open-source computer vision frameworks, including Det3D, CVPods, and EFG that garner substantial popularity.

Interests

Vision-Language-Action Models
Diffusion Models
World Models
AI Infrastructure

Education

Ph.D in Electronic Engineering, 2021 ~ 2025
The Chinese University of Hong Kong (CUHK)
B.Eng in Software Engineering, 2014 ~ 2018
South China University of Technology (SCUT)

News

2025-06 ConsistentCity for temporally consistent 3D scene synthesis is accepted by ICCV 2025. ✨
2025-05 MoviiGen-1.1, a WAN2.1-based T2V model with high cinematic aesthetics is made public.
2024-07 The high-res nuCraft 3D Occupancy Dataset is accepted by ECCV 2024. ✨
2023-03 EFG, an Efficient, Flexible, and General deep learning framework is public avaiable!
2022-12 ConQueR is accepted by CVPR 2023, and selected as a Highlight (Top 2.5%). ✨

Work Experience

Senior Research Engineer

Li Auto

May 2025 – Present Beijing, China

World Models, Vision-Language-Action Models, Reinforcement Learning.

Researcher

MEGVII Research

January 2019 – May 2021 Beijing, China

End-to-end Object Detection, Unsupervised/Self-supervised Learning, Research Infrustrature.

Featured Publications

Publications list can be found HERE. Or visit my Google Scholar.

Benjin ZHU, Zhe Wang, Hongsheng Li

July, 2024 In ECCV

nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding

ECCV 2024, Dataset

Benjin ZHU, Zhe Wang, Shaoshuai Shi, Hang Xu, Lanqing Hong, Hongsheng Li

December, 2022 In CVPR

ConQueR: Query Contrast Voxel-DETR for 3D Object Detection

CVPR 2023 Highlight Presentation (top 2.5%)

More Publications

Benjin ZHU, Junqiang Huang, Zeming Li, Xiangyu Zhang, Jian Sun (2020). EqCo: Equivalent Rules for Self-supervised Contrastive Learning. arXiv.

PDF Cite Code

Benjin ZHU, Jianfeng Wang, Zhengkai Jiang, Fuhang Zong, Songtao Liu, Zeming Li, Jian Sun (2020). AutoAssign: Differentiable Label Assignment for Dense Object Detection. arXiv.

PDF Cite Code

Benjin ZHU, Zhengkai Jiang, Xiangxin Zhou, Zeming Li, Gang Yu (2019). Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection. arXiv.

PDF Cite Code

Projects

For all projects, see here.

EFG: An Efficient, Flexible, and General deep learning framework that retains minimal.

Easy-to-use research codebase. Users can use EFG to explore any research topics following project templates.

CVPods: All-in-one Toolbox for Computer Vision Research.

Welcome to cvpods, a versatile and efficient codebase for many computer vision tasks. The aim of cvpods is to achieve efficient experiments management and smooth tasks-switching.