Zhijian Huang (黄志坚)

I am currently a researcher at XiaomiXiaomi icon. I received my M.Sc. from the HCP-Lab at Sun Yat-sen University, advised by Prof. Xiaodan Liang, and obtained my B.Eng. degree from Sun Yat-sen University as well.

My research interests center on Large Multimodal Models and their downstream applications, including VLA and World Models, with a focus on Autonomous Driving and Embodied Intelligence. I am always open to discussions and collaborations — feel free to reach out via or .

Zhijian Huang portrait

Publications

OneVL
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

Jinghui Lu, Jiayi Guan, Zhijian Huang, Jinlong Li, Guang Li, Lingdong Kong, Yingyan Li, et al.

We present OneVL (One-step latent reasoning and planning with Vision-Language explanations), a unified VLA and World Model framework that routes reasoning through compact latent tokens supervised by dual auxiliary decoders.

Mimo-Embodied
MiMo-Embodied: X-Embodied Foundation Model Technical Report

Xiaoshuai Hao, Lei Zhou, Zhijian Huang, Zhiwen Hou, Yingbo Tang, Lingfeng Zhang, Guang Li, Zheng Lu, Shuhuai Ren, et al.

We open-source MiMo-Embodied, the first cross-embodied foundation model to successfully integrate and achieve state-of-the-art performance in both Autonomous Driving and Embodied AI.

VGGDrive
VGGDrive: Empowering Vision-Language Models with Cross-View Geometric Grounding for Autonomous Driving

Jie Wang, Guang Li, Zhijian Huang, Chenxu Dang, Hangjun Ye, Yahong Han, Long Chen

We propose a novel architecture, VGGDrive, which empowers Vision-language models with cross-view Geometric Grounding for autonomous Driving.

X-SAM
X-SAM: From Segment Anything to Any Segmentation

Hao Wang, Limeng Qiao, Zequn Jie, Zhijian Huang, Chengjian Feng, Qingfang Zheng, Lin Ma, Xiangyuan Lan, Xiaodan Liang

We present X-SAM, a streamlined Multimodal Large Language Model framework that extends the segmentation paradigm from Segment Anything to Any Segmentation.

RoboTron-Drive
RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving

Zhijian Huang, Chengjian Feng, Fen Yan, Baihui Xiao, Zequn Jie, Yujie Zhong, Xiaodan Liang, Lin Ma

A novel all-in-one large multimodal model robustly equipped with general capabilities and strong generalization for autonomous driving tasks.

RoboTron-Sim
RoboTron-Sim: Improving Real-World Driving via Simulated Hard-Case

Baihui Xiao, Chengjian Feng, Zhijian Huang, Feng Yan, Yujie Zhong, Lin Ma

We propose RoboTron-Sim that improves real-world driving in critical situations by utilizing simulated hard cases.

RDA-Driver
Making Large Language Models Better Planners with Reasoning-Decision Alignment

Zhijian Huang, Tao Tang, Shaoxiang Chen, Sihao Lin, Zequn Jie, Lin Ma, Guangrun Wang, Xiaodan Liang

We introduce RDA-Driver, a multimodal LLM decision-making model with reasoning-decision alignment for stronger autonomous driving planning.

Fuller
FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration

Zhijian Huang, Sihao Lin, Guiyu Liu, Mukun Luo, Chaoqiang Ye, Hang Xu, Xiaojun Chang, Xiaodan Liang,

We introduce FULLER, a novel yet simple multi-level gradient calibration learning framework across tasks and modalities during optimization.

Arch-graph
Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search

Minbin Huang Zhijian Huang, Changlin Li, Xin Chen, Hang Xu, Zhenguo Li, Xiaodan Liang,

We introduce Arch-Graph, a transferable NAS method that predicts task-specific optimal architectures with respect to given task embeddings.

DSBench
Is Your VLM for Autonomous Driving Safety-Ready? A Comprehensive Benchmark for Evaluating External and In-Cabin Risks

Xianhui Meng, Yuchen Zhang, Zhijian Huang, Zheng Lu, Ziling Ji, Yaoyao Yin, Hongyuan Zhang, Guangfeng Jiang, Yandan Lin, Long Chen, Hangjun Ye, Li Zhang, Jun Liu, Xiaoshuai Hao, Xiaodan Liang,

We introduce DSBench, the first comprehensive Driving Safety Benchmark designed to assess a VLM's awareness of various safety risks in a unified manner.

Experience

Researcher
Xiaomi icon Xiaomi
Mentor: Long Chen
2025.07 – Present
Research Intern
Meituan icon Meituan
2023.04 – 2025.07
Research Intern
Huawei icon Huawei Noah's Ark Lab
Mentor: Hang Xu
2022.03 – 2023.01

Academic Services

Conference Reviewer · CVPR, ECCV, ICCV