Wenhao Wu (吴文灏)

Short Biography

Wenhao Wu is an Applied Scientist at Amazon AGI working on the Amazon Foundation Model: Nova. Before joining Amazon, he spent nearly seven years (2018-2025) at Baidu VIS working with Chief Scientist Dr. Jingdong Wang (IEEE Fellow). There, he progressed from a research intern into a Senior/Staff Researcher and contributed to multiple large-scale computer vision and multimodal projects. He earned his Ph.D. from the MMLab, The University of Sydney, supervised by Prof. Wanli Ouyang. Previously, he obtained his M.S.E degree from University of Chinese Academy of Sciences (UCAS), under the supervision of Prof. Shifeng Chen and Prof. Yu Qiao.

Since 2016, he has been engaged in AI research and development across both academia and industry, gaining extensive industry experience at Amazon AGI, Baidu AIG, Snap Research, SenseTime Research, Samsung Research, iQIYI AI. He has also been affiliated with academic institutions including The University of Sydney (MMLab@USYD), The Chinese University of Hong Kong (MMLab@CUHK), and Chinese Academy of Science (MMLab@SIAT-CAS).

He is honored to be a recipient of the Baidu PhD Fellowship (2023), DAAD AInet Fellowship (2025).

If interested in collaboration or discussion, please email me.

Research Interest

My research interests lie broadly in Computer Vision and Deep Learning. In recent years, my work has been driven by the paradigm shift toward Multi-modal LLMs, with a dedicated focus on Video Understanding. My current and prior research themes include:

Large Multi-modal/Cross-modal Models (2023-Present)
- Multi-model LLMs / Reasoning / Omni (2023-Present): Nar-KFC (ICLR'26), Mulberry (NeurIPS'25 Spotlight), R1-ShareVL (NeurIPS'25), MMReason (ICCV'25), FreeVA , Dense Connecter (NeurIPS'24), AMP (NeurIPS'24), DistinctAD (CVPR'25 Highlight)
- Driving LLM/MLLM for Zero-shot Vision Tasks (2023-2024): GPT4Vis , GPT4Ego (TMM'24), GPT4Det (ECCV'24)
- Video-Language Learning (2022-2024): Text4Vis (AAAI'23 & IJCV) , BIKE (CVPR'23), Cap4Video (CVPR'23 Highlight & TPAMI), UATVR (ICCV'23)
Video Foundation Models (2018-2023) (Click to expand)
- Video Backbone (2018-2023): Side4Video , ATM (ICCV'23), MVFNet (AAAI'21)
- Video Sampler (2018-2022): MARL (ICCV'19, Oral), NSNet (ECCV'22), TSQNet (ECCV'22)
- Video Self-Supervised Learning (2020-2022): ASCNet (ICCV'21), MaMiCo (ACMMM'22, Oral)
- Video Event Localization (2021-2022): BCNet (AAAI'22), WSSTAD (IJCAI'21)
Image Processing/Editing (2021-2022) (Click to expand)
- Arbitrary Image Rescaling: BAIRNet (CVPR'22)
- Style Transfer: MSPC (CVPR'22), AdaCM (AAAI'23)

Updates

New 01/2026: Our paper Nar-KFC has been accepted by ICLR 2026. Congratulations to Bo.
New 12/2025: Excited to share that our Nova 2.0 was launched at reInvent2025! Proud to serve as a core contributor, driving key work across video SFT/CoT/RL to power the system’s video understanding and reasoning capabilities. Looking forward to what comes next! See Technical Report.
New 09/2025: Mulberry (Spotlight) and R1-ShareVL are accepted by NeurIPS 2025!
New 06/2025: Our paper MMReason has been accepted by ICCV 2025. Congratulations to Huanjin.
04/2025: I obtain the DAAD AInet Fellowship.
02/2025: Our paper DistinctAD has been accepted by CVPR 2025 (Highlight). Congratulations to Bo.
09/2024: [2/2] Dense Connecter and AMP are accepted by NeurIPS 2024! Dense Connector was cited by Apple MM1.5. Congratulations to Huanjin and Mengxi!
07/2024: Received CGAOSSA Award (The highest award granted by the Chinese government to Chinese students overseas, 650 recipients per year).
05/2024: We explore visual signals, RLHF, and zero-shot image-to-video extension for MLLMs: (1) We introduce the Dense Connecter , a simple, effective, plug-and-play vision-language connector that enhances existing MLLMs by leveraging multi-layer visual features with minimal computational overhead. (2) We present an Automated Multi-level Preference (AMP) framework for RLHF, replacing binary preference learning. It generates high-quality multi-level preference datasets without human/AI annotators and employs the multi-level DPO (MDPO) algorithm. (3) I release FreeVA , which provides a plug-and-play, simple yet effective study exploring the utilization of existing image MLLMs as video conversational models in a training-free manner. ⚡The core code can be just one line!
05/2024: The extension of Cap4Video has been accepted by TPAMI.
01/2024: 🎖 I'm honored to be among the 10 PhD students globally awarded the 11th Baidu Scholarship, a prestigious fellowship in Artificial Intelligence, providing 200,000 RMB (about $30,000) to selectees from thousands of applicants. [第11届百度奖学金揭晓, 全球10人全员95后]
11/2023: We release GPT4Vis , which provides a Quantitative Evaluation of GPT-4 (😭Running once roughly costs 💰$4000💰) for Visual Understanding across images, videos and point clouds, spinning on 16 datasets.
11/2023: We release Side4Video , a Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning, which significantly reduces the training memory cost for action recognition (↓75%) and text-video retrieval (↓30%).
08/2023: The extension of Text4Vis has been accepted by IJCV.
07/2023: Two First-author papers (Temporal Modeling: ATM , Cross-Modal Retrieval: UA ) are accepted by ICCV2023.
02/2023: Two First-author papers for video understanding (BIKE , Cap4Video ) are accepted by CVPR 2023. Cap4Video involves GPT to enhance text-video learning, is selected as a Highlight paper (Top 2.5%).

(Click to expand)

11/2022: Two papers (Video Recognition: Text4Vis , Style Transfer: AdaCM) are accepted by AAAI 2023.

07/2022: Three papers (Video Sampling: NSNet, TSQNet, Cross-Modal Learning: CODER) are accepted by ECCV 2022.

06/2022: Our MaMiCo, a new video self-supervised learning work, is accepted by ACMMM 2022 (Oral Presentation).

03/2022: Two low-level vision papers (MSPC, BAIRNet) are accepted by CVPR 2022.

12/2021: Our BCNet , an general temporal localization framework, is accepted by AAAI 2022.

07/2021: Our ASCNet, a self-supervised video representation learning framework, is accepted by ICCV 2021.

07/2021: Two papers (Video Recognition, Crowd Counting) are accepted by ACMMM 2021.

04/2021: We present a novel task: Weakly-Supervised Spatio-Temporal Anomaly Detection, is accepted by IJCAI 2021.

04/2021: Winner in the Traffic Anomaly Detection Track of the CVPR 2021 AI CITY CHALLENGE.

12/2020: Our MVFNet , an efficient temporal module, is accepted by AAAI 2021.

07/2020: Our ADD-GCN for multi-label image recognition, is accepted by ECCV 2020.

05/2020: One dynamic video inference paper is accepted for Oral Presentation on CVPR2020 EDLCV workshop.

07/2019: My first paper MARL, a novel video sampler, is accepted as Oral Presentation (Top 4%) on ICCV 2019.

09/2017: Recommended to University of Chinese Academy of Sciences towards Master degree with exam exemption (保送研究生).

06/2017: Graduated from Central South University with Outstanding Graduate Honor.

10/2016: Joined MMLab@SIAT as research intern. Started working on Computer Vision.

Working Experience

Amazon AGI

Applied Scientist on Amazon Foundation Model (Nova)
worked with Dr. Xinyu Li and Dr. Davide Modolo
June 2025 - Present. Bellevue, WA, USA

Baidu VIS

Intern/Senior/Staff Researcher on Video Understanding
worked with Dr. Jingdong Wang (IEEE Fellow), Dr. Dongliang He and Dr. Errui Ding
Sep 2018 - June 2025. Shenzhen / Beijing / Remote, China

Product / Model Release

Nova 2 Family: Multimodal reasoning and generation models
Core Contributor
[Technical Report] [Website] [Release News]

Selected Publications [ Full List ]

( *Co-first Author, ^*Correspondence)

Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders
Bo Fang, Wenhao Wu, Qiangqiang Wu, Yuxin Song, Antoni B. Chan
International Conference on Learning Representations (ICLR), 2026
[ PDF ]

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Huanjin Yao, Jiaxing Huang, Wenhao Wu, Jingyi Zhang, Yibo Wang, Shunyu Liu, Yingjie Wang, Yuxin Song, Haocheng Feng, Li Shen, Dacheng Tao
Conference on Neural Information Processing Systems (NeurIPS), 2025
[Spotlight, Top 3.1% of all submissions] [ PDF ] [ Code ]
We introduce CoMCTS, a method leveraging collective knowledge for efficient, step-by-step reasoning, and release Mulberry-260k, a rich multimodal reasoning dataset.

R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO
Huanjin Yao, Qixiang Yin, Jingyi Zhang, Min Yang, Yibo Wang, Wenhao Wu, Fei Su, Li Shen, Minghui Qiu, Dacheng Tao, Jiaxing Huang
Conference on Neural Information Processing Systems (NeurIPS), 2025
[ PDF ] [ Code ]

MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI
Huanjin Yao, Jiaxing Huang, Yawen Qiu, Michael K. Chen, Wenzheng Liu, Wei Zhang, Wenjie Zeng, Xikun Zhang, Jingyi Zhang, Yuxin Song, Wenhao Wu, Dacheng Tao
IEEE International Conference on Computer Vision (ICCV), 2025
[ PDF ] [ Code ]

DistinctAD: Distinctive Audio Description Generation in Contexts
Bo Fang, Wenhao Wu, Qiangqiang Wu, Yuxin Song, Antoni B. Chan
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2025
[Highlight, Top 2.9% of all submissions] [ PDF ]
We present DistinctAD, a novel method for generating textual narrations of movies.

Dense Connector for MLLMs
Huanjin Yao, Wenhao Wu*^*, Taojiannan Yang, Yuxin Song, Mengxi Zhang, Haocheng Feng, Yifan Sun, Zhiheng Li, Wanli Ouyang, Jingdong Wang
Conference on Neural Information Processing Systems (NeurIPS), 2024
[ PDF ] [ Code ] [ My Blog (Chinese) ]
A universal plug-and-play module to enhance Multimodal-LLM.

Automated Multi-level Preference for MLLMs
Mengxi Zhang, Wenhao Wu, Yu Lu, Yuxin Song, Kang Rong, Huanjin Yao, Jianbo Zhang, Fanglong Liu, Yifan Sun, Haocheng Feng, Jingdong Wang
Conference on Neural Information Processing Systems (NeurIPS), 2024
[ PDF ] [ Code ] [ My Blog (Chinese) ]
We present an AMP for RLHF, replacing binary preference learning. It generates high-quality multi-level preference datasets without human/AI annotators.

FreeVA: Offline MLLM as Training-Free Video Assistant
Wenhao Wu
Technical Report, 2024
[ PDF ] [ Code ]
FreeVA - a plug-and-play, simple yet effective study exploring the utilization of existing image MLLMs as video conversational models in a training-free manner.

GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?
Wenhao Wu, Huanjin Yao, Mengxi Zhang, Yuxin Song, Wanli Ouyang, Jingdong Wang
Technical Report, 2023
[ PDF ] [ Code ] [ My Blog (Chinese) ]
We provide a Quantitative Evaluation of GPT-4 for Visual Understanding across images, videos and point clouds, spinning on 16 popular datasets.
(😭Running all tests once roughly costs 💰$4000+💰)

Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
Huanjin Yao, Wenhao Wu*^*, Zhiheng Li
Technical Report, 2023
[ PDF ] [ Code ]
Side4Video significantly reduces the training memory cost for action recognition (↓75%) and text-video retrieval (↓30%).

What Can Simple Arithmetic Operations Do for Temporal Modeling?
Wenhao Wu, Yuxin Song, Zhun Sun, Jingdong Wang, Chang Xu, Wanli Ouyang
IEEE International Conference on Computer Vision (ICCV), 2023
[ PDF ] [ Code ]

Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Wenhao Wu, Haipeng Luo, Bo Fang, Jingdong Wang, Wanli Ouyang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2023
[Highlight, Top 2.5% of 9155 submissions] [ PDF ] [ Code ]

Cap4Video++: Enhancing Video Understanding with Auxiliary Captions
Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang
Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Impact factor: 23.6
[ PDF ]
Cap4Video leverages auxiliary captions generated by GPT to enhance cross-modal learning.

UATVR: Uncertainty-Adaptive Text-Video Retrieval
Bo Fang*, Wenhao Wu*, Chang Liu, Yu Zhou, Yuxin Song, Weiping Wang, Xiangbo Shu, Xiangyang Ji, Jingdong Wang
IEEE International Conference on Computer Vision (ICCV), 2023
[ PDF ] [ Code ]

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2023
[ PDF ] [ Code ]

Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu, Zhun Sun, Wanli Ouyang
The AAAI Conference on Artificial Intelligence (AAAI) , 2023
[ PDF ] [ Code ] [ Poster ] [ Slides ] [ Video ]

Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective
Wenhao Wu, Zhun Sun, Yuxin Song, Jingdong Wang, Wanli Ouyang
International Journal of Computer Vision (IJCV), 2023 Impact factor: 19.5
[ PDF ]
We revisit the classifier with the textual embeddings, and achieve SOTA performance on Full-supervision/Few-shot/Zero-shot recognition.

NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition
Boyang Xia*, Wenhao Wu*^*, Haoran Wang, Rui Su, Dongliang He, Haosen Yang, Xiaoran Fan, Wanli Ouyang
European Conference on Computer Vision (ECCV) , 2022
[ PDF ] [ Project ]
A sampler with a 4x faster practical speed than SOTA methods.

Temporal Saliency Query Network for Efficient Video Recognition
Boyang Xia*, Zhihao Wang*, Wenhao Wu^*, Haoran Wang, Jungong Han
European Conference on Computer Vision (ECCV) , 2022
[ PDF ] [ Project ]
TSQNet, the first work to model temporal sampling as a query-response task.

MaMiCo: Macro-to-Micro Semantic Correspondence for Self-supervised Video Representation Learning
Bo Fang*, Wenhao Wu*, Chang Liu*, Yu Zhou, Dongliang He, Weiping Wang
ACM International Conference on Multimedia (ACMMM) , 2022
[Oral, 5.0% acceptance rate] [ PDF ] [ Project ]
MaMiCo, a self-supervised Macro-to-Micro Semantic Correspondence learning framework for video representation learning.

Temporal Action Proposal Generation with Background Constraint
Haosen Yang*, Wenhao Wu*, Lining Wang, Sheng Jin, Boyang Xia, Hongxun Yao, Hujie Huang
The AAAI Conference on Artificial Intelligence (AAAI) , 2022
[15% acceptance rate] [ PDF ] [ Code ]
BCNet, an general framework for effective Temporal Action Proposal Generation.

ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency
Deng Huang*, Wenhao Wu*, Weiwen Hu, Xu Liu, Dongliang He, Zhihua Wu, Xiangmiao Wu, Mingkui Tan, Errui Ding
IEEE International Conference on Computer Vision (ICCV) , 2021
[ PDF ] [ Poster ] [ Slides ] [ Video ] [ Code ]
An effective self-supervised video representation learning framework.

DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning
Wenhao Wu*, Yuxiang Zhao*, Yanwu Xu, Xiao Tan, Dongliang He, Zhikang Zou, Jin Ye, Yingying Li, Mingde Yao, Zichao Dong, Yifeng Shi
ACM International Conference on Multimedia (ACMMM) , 2021
[ PDF ] [ Poster ] [ Slides ] [ Code ]
An efficient plug-and-play module for effective video-level representation learning.

MVFNet: Multi-View Fusion Network for Efficient Video Recognition
Wenhao Wu, Dongliang He, Tianwei Lin, Fu Li, Chuang Gan, Errui Ding
The AAAI Conference on Artificial Intelligence (AAAI) , 2021
[ PDF ] [ Poster ] [ Slides ] [ Code ] [ Bibtex ]

          @inproceedings{wu2021mvfnet,
          title={Mvfnet: Multi-view fusion network for efficient video recognition},
          author={Wu, Wenhao and He, Dongliang and Lin, Tianwei and Li, Fu and Gan, Chuang and Ding, Errui},
          booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
          volume={35},
          number={4},
          pages={2943--2951},
          year={2021}
          }

An efficient architecture for video recognition based on 2D CNN.

Good Practices and A Strong Baseline for Traffic Anomaly Detection
Yuxiang Zhao*, Wenhao Wu*, Yue He, Yingying Li, Xiao Tan, Shifeng Chen
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) - 5th AI City Challenge (AICity), 2021
[ PDF ] [ Code ] [ Bibtex ]

        @inproceedings{zhao2021good,
        title={Good Practices and A Strong Baseline for Traffic Anomaly Detection},
        author={Zhao, Yuxiang and Wu, Wenhao and He, Yue and Li, Yingying and Tan, Xiao and Chen, Shifeng},
        booktitle={Proceedings of CVPR Workshops},
        year={2021}
        }

Winner of AI City challenge for traffic anomaly detection

Dynamic Inference: A New Approach Toward Efficient Video Action Recognition
Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Yi Yang, Shilei Wen
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) - Joint Workshop on Efficient Deep Learning in Computer Vision (EDLCV), 2020
[Oral] [ PDF ] [ Slides ] [ Bibtex ]

        @inproceedings{wu2020dynamic,
            title={Dynamic Inference: A New Approach Toward Efficient Video Action Recognition},
            author={Wu, Wenhao and He, Dongliang and Tan, Xiao and Chen, Shifeng 
              and Yang, Yi and Wen, Shilei},
            booktitle={Proceedings of CVPR Workshops},
            pages={676--677},
            year={2020}
        }

Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition
Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Shilei Wen
IEEE International Conference on Computer Vision (ICCV), 2019
[Oral, 4.3% acceptance rate] [ PDF ] [ Poster ] [ Slides ] [ Bibtex ]

@inproceedings{wu2019multi,
    title={Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed
       Video Recognition},
    author={Wu, Wenhao and He, Dongliang and Tan, Xiao and Chen, Shifeng and Wen, Shilei},
    booktitle={Proceedings of the IEEE International Conference on Computer Vision},
    pages={6222--6231},
    year={2019}
}

Education & Visiting

The University of Sydney, Australia

Doctor of Philosophy in Computer Science
Team: Multimedia Laboratory (MMLab@USYD)
Advisor: Prof. Wanli Ouyang
2022 - 2025

The Chinese University of Hong Kong, Hong Kong

Honorary Research Assistant in Multimedia Laboratory (MMLab@CUHK)
Advisor: Prof. Wanli Ouyang
2023 - 2024

University of Chinese Academy of Sciences, China

Master of Science in Engineering in Pattern Recognition & Intelligent System
Team: Multimedia Laboratory, SIAT, CAS (MMLab@SIAT-CAS)
Advisor: Prof. Shifeng Chen, Prof. Yu Qiao
2017 - 2020 with exam exemption (保送研究生)

Shenzhen Institute of Advanced Technology, Chinese Academy of Science, China

Research Assistant in Multimedia Laboratory (MMLab@SIAT-CAS)
Advisor: Prof. Shifeng Chen, Prof. Yu Qiao
Oct. 2016 - Aug. 2017

Central South University, China

Bachelor of Engineering in Automation
University Club: Electronic Design Association, Smart Vehicle Association, UAV Association
2013 - 2017

Short-term Internships

Snap Research

Research Intern in Creative Vision Group
worked with Dr. Yanyu Li
Mar 2025 - April 2025. Los Angeles, USA

Amazon AWS AI

Applied Scientist Intern
worked with Dr. Shuai Zhang, Dr. Taojiannan Yang, Dr. Bernie Wang
Jun. 2024 - Sep. 2024. Santa Clara, USA

SenseTime Research

Research Intern in OpenMMLab Team
worked with Dr. Kai Chen
Jan. 2020 - Feb. 2020. Shenzhen, China

iQIYI

R&D Intern in Video Analysis Group
hosted by Qiyue Liu
Jun. 2018 - Oct. 2018. Beijing, China

Samsung Research China

Research Intern in Samsung Advanced Institute of Technology (SAIT)
hosted by Zhenbo luo
Mar. 2018 - Jun. 2018. Beijing, China

Contests

CVPR2021 AI CITY Challenge: Traffic Anomaly Detection, Winner Award, 2021
CVPR2021 NTIRE Challenge on Image Deblurring: Track 2 JPEG Artifacts, Runner-Up Award, 2021
Meritorious Winner (First Prize), America Mathematical Contest in Modeling (MCM), 2016
First-class Prize, National Undergraduate Mechanical Innovation Design Competition, 2016
Second-class Prize, China Freescale Cup Intelligent Car Competition (South China Region), 2015
Second-class Prize, Smart Car Racing Competition of Hunan Province, 2015

Honors and Awards

Academic (Undergraduate/Master’s/PhD) Honors

DAAD AInet Fellowship, 2025

CVPR Doctoral Consortium Award, 2025

OpenAI Researcher Access Program ($5000), 2024

Chinese Government Award for Outstanding Self-financed Students Abroad (The highest award granted by the Chinese government to Chinese students overseas, 650 recipients worldwide), 2024

Baidu PhD Fellowship (10 awardees worldwide, 200,000 RMB), 2023

Faculty of Engineering Research Scholarship (Full PhD Scholarship), 2022-2025

Outstanding Student of University of Chinese Academy of Sciences, 2020

Scholarship for Academic Excellence of SIAT, CAS, 2018

Postgraduate Scholarship of UCAS (Full Master Scholarship), 2017-2020

Graduate with Honour of Central South University (Top 5%), 2017

National Inspirational Scholarship (Top 5%) awarded by the Ministry of Education, 2016

Excellent National College Students Innovation and Entrepreneurship Project (20,000 RMB), 2016

Outstanding Student Leader, Outstanding Student of Central South University, 2013-2017

Scholarship for Academic Excellence of Central South University, 2013-2017

Professional Honors

Australian Good Design Award, 2022

Baidu AIG-TPG Technical Innovation Team Award, 2021 H1

Baidu TPG-VIS Spotlight Project Award, 2021 Q1

Baidu AIG-TPG Best Innovation Team Award, 2020

Baidu Best Newcomer Award, 2020

Baidu Best Intern Award, 2019

Academic Activities

Journal Reviewer

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

IEEE Transaction on Image Processing (TIP)

IEEE Transaction on Circuits and Systems for Video Technology (TCSVT)

IEEE Transactions on Multimedia (TMM)

Computer Vision and Image Understanding (CVIU)

IEEE Transactions on Biomedical Engineering (TBME)

Knowledge-Based Systems (KBS)

International Journal of Multimedia Information Retrieval (IJMIR)

IEEE Intelligent Transportation Systems Transactions (TITS)

Internet of Things (IOT)

Transactions on Multimedia Computing Communications and Applications (TOMM)

Conference PC Member/Reviewer

Reviewer, International Conference on Machine Learning (ICML), 2025

Reviewer, Conference on Neural Information Processing Systems (NeurIPS), 2024

Reviewer, The Conference on Computer Vision and Pattern Recognition (CVPR), 2021, 2022, 2023, 2024, 2025, 2026

Reviewer, International Conference on Computer Vision (ICCV), 2023, 2025

Reviewer, European Conference on Computer Vision (ECCV), 2022, 2024

PC Member, International Joint Conference on Artificial Intelligence (IJCAI), 2021

PC Member, The AAAI Conference on Artifical Intelligence (AAAI), 2021, 2022, 2023

Reviewer, ACM International Conference on Multimedia (ACMMM), 2023, 2024

Reviewer, Winter Conference on Applications of Computer Vision (WACV), 2022

Reviewer, International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2022

Member of IEEE, ACM, AAAI and CVF

Off-Campus Mentor of Tsinghua University, 2023-2025

Mentoring Experience

I am fortunate to have provided help to talented mentees (at Baidu/University) including:

Mentored Interns at Baidu (→: Post-mentorship status)

Huanjin Yao (Master, Tsinghua University → PhD, HKUST), NeurIPS'24, NeurIPS'25 (Spotlight), 2023 ↝ 2025
Bo Fang (Master, CAS → PhD, CityU), ACMMM'23 Oral, ICCV'23, CVPR'25, ICLR'26, 2021 ↝ Now
Mengxi Zhang (Master, Tianjin University → ByteDance AI Lab), NeurIPS'24, 2024
Boyang Xia (Master, CAS → Kuaishou), 2x ECCV'22, 2021 ↝ 2022
Haosen Yang (Master, HIT → PhD, University of Surrey), AAAI'22, 2021
Deng Huang (Master, SCUT → AutoX), ICCV'21, 2021
Yuguo Wang (Bachelor, UESTC → Master, Duke University), 2022
Hengyuan Zhao (Bachelor, NUPT → PhD, NUS), 2021
Yuxiang Zhao (Master, CAS → PhD, Peking University, ACMMM'21, CVPR'21 AICity Winner🏆), 2021

University Mentees

Guangzhao Dai (PhD, NJUST), TMM'24, 2023
Haipeng Luo (Master, CAS → PhD, Tsinghua University), CVPR'23, 2022 ↝ 2023
Zhihao Wang (Master, CAS → MiniMax), ECCV'22, 2022

Collaborators & Friends

Xiaohan Wang (Stanford University), Xiao Tan (Baidu), Dongliang He (ByteDance), Tianwei Lin (Horizon), Yanwu Xu (Boston University), Jie Wu (ByteDance), Jin Ye (Monash University), Chuang Gan (MIT-IBM Watson Lab), Yihao Liu (Shanghai Lab), Zhun Sun (Tohoku University, Japan), Mingde Yao (CUHK), Min Yang (ByteDance)

Last Updated on 1st Jan, 2026

Published with GitHub Pages