Wenhao Wu

Wenhao Wufirst name pronounced “when-how” 吴文灏

Bellevue, WA, USA

Core contributor to Amazon Nova. I work on video understanding and multimodal LLMs — from foundation models to post-training.

About

I'm an Applied Scientist at Amazon AGI. Before Amazon, I spent nearly seven years (2018–2025) at Baidu VIS with Chief Scientist Dr. Jingdong Wang (IEEE Fellow), progressing from research intern to Senior/Staff Researcher across multiple large-scale computer-vision and multimodal projects.

I earned my Ph.D. from MMLab, The University of Sydney, advised by Prof. Wanli Ouyang, and my M.S.E. from the University of Chinese Academy of Sciences (UCAS) under Prof. Shifeng Chen and Prof. Yu Qiao. I've also worked at AWS AI Labs, SenseTime, Samsung Research and iQIYI AI, and was a visiting scholar at MMLab@CUHK and MMLab@SIAT-CAS.

I'm a recipient of the Baidu PhD Fellowship (2023, 10 worldwide) and the DAAD AInet Fellowship (2025).

In this era of rapidly advancing AGI, I've come to believe that incremental papers matter far less than what actually moves the needle — rigorous scaling validation and the unglamorous, careful engineering behind it, like getting data quality right. That's where I've focused my energy since 2025. If this resonates and you'd like to collaborate or chat, feel free to email me.

Research Interests

Multi-modal / Omni LLMs Video Understanding Multimodal Reasoning Post-Training · RLHF / RL

Selected Oral & Highlight Papers

A few works honored by top venues as Oral / Highlight / Spotlight (top few %) — from the good old days of chasing papers. Feel free to skip; these days I'd rather build things that matter. Still, the full list is here if you're curious.

Open Source

Code released and used by the community · 3k+ GitHub stars

Product / Model Release

Amazon Nova

Nova 2 Family — Multimodal reasoning & generation models

Core Contributor

News

Earlier news
  • May 2024Released FreeVA — training-free video conversational models from image MLLMs.
  • May 2024The extension of Cap4Video accepted by TPAMI.
  • Jan 2024Among the 10 PhD students worldwide awarded the 11th Baidu Scholarship (200,000 RMB).
  • Nov 2023Released GPT4Vis — quantitative evaluation of GPT-4 for visual understanding across 16 datasets.
  • Nov 2023Released Side4Video — memory-efficient image-to-video transfer learning.
  • Aug 2023The extension of Text4Vis accepted by IJCV.
  • Jul 2023Two first-author papers (ATM, UATVR) accepted to ICCV 2023.
  • Feb 2023Two first-author papers (BIKE, Cap4Video) accepted to CVPR 2023; Cap4Video is a Highlight (Top 2.5%).
  • Nov 2022Two papers (Text4Vis, AdaCM) accepted to AAAI 2023.
  • Jul 2022Three papers (NSNet, TSQNet, CODER) accepted to ECCV 2022.
  • Jun 2022MaMiCo accepted to ACMMM 2022 (Oral).
  • Mar 2022Two low-level vision papers (MSPC, BAIRNet) accepted to CVPR 2022.
  • Dec 2021BCNet accepted to AAAI 2022.
  • Jul 2021ASCNet accepted to ICCV 2021; two papers accepted to ACMMM 2021.
  • Apr 2021WSSTAD accepted to IJCAI 2021; Winner of the CVPR 2021 AI City Challenge (Traffic Anomaly Detection).
  • Dec 2020MVFNet accepted to AAAI 2021.
  • Jul 2020ADD-GCN accepted to ECCV 2020.
  • Jul 2019First paper MARL accepted to ICCV 2019 (Oral, Top 4%).
  • Oct 2016Joined MMLab@SIAT as research intern — started working on computer vision.

Experience

Amazon AGI · Bellevue, WA, USA
Applied Scientist on the Amazon Foundation Model (Nova) · with Dr. Davide Modolo
Jun 2025 – Present
Baidu VIS · Shenzhen / Beijing, China · Sydney (remote)
Intern → Senior/Staff Researcher on Video Understanding · with Dr. Jingdong Wang (IEEE Fellow), Dr. Dongliang He, Dr. Errui Ding
Sep 2018 – Jun 2025

Education & Visiting

Ph.D. in Computer Science · MMLab@USYD · Advisor: Prof. Wanli Ouyang
2022 – 2025
Honorary Research Assistant, MMLab@CUHK · Advisor: Prof. Wanli Ouyang
2023 – 2024
M.S.E. in Pattern Recognition & Intelligent System · MMLab@SIAT-CAS · Advisors: Prof. Shifeng Chen, Prof. Yu Qiao · exam-exempted admission (保送)
2017 – 2020
B.Eng. in Automation · Graduated with Outstanding Graduate Honor (Top 5%)
2013 – 2017

Short-term Internships

Amazon AWS AI · Santa Clara, USA
Applied Scientist Intern · with Dr. Shuai Zhang, Dr. Taojiannan Yang, Dr. Bernie Wang
Jun 2024 – Sep 2024
SenseTime Research · Shenzhen, China
Research Intern, OpenMMLab Team · with Dr. Kai Chen
Jan 2020 – Feb 2020
iQIYI · Beijing, China
R&D Intern, Video Analysis Group · hosted by Qiyue Liu
Jun 2018 – Oct 2018
Samsung Research China · Beijing, China
Research Intern, Samsung Advanced Institute of Technology (SAIT) · hosted by Zhenbo Luo
Mar 2018 – Jun 2018

Honors & Awards

Academic Honors

Professional Honors

Competitions

Academic Service

Conference Reviewer / PC Member: ICML (2025–2026), NeurIPS (2024–2026), CVPR (2021–2026), ICCV (2023, 2025), ECCV (2022, 2024, 2026), IJCAI (2021), AAAI (2021–2023), ACMMM (2023–2024), WACV (2022).

Journal Reviewer: TPAMI, IJCV, TNNLS, TIP, TCSVT, TMM, CVIU, TBME, KBS, IJMIR, TITS, TOMM.

Member of IEEE, ACM, AAAI and CVF · Off-Campus Mentor of Tsinghua University (2023–2025).

Mentoring

Fortunate to have mentored talented students at Baidu and universities (→ post-mentorship status):
Huanjin Yao (Tsinghua → PhD, HKUST) — NeurIPS'24, NeurIPS'25 (Spotlight)
Bo Fang (CAS → PhD, CityU) — ACMMM'23 Oral, ICCV'23, CVPR'25, ICLR'26
Mengxi Zhang (Tianjin Univ. → ByteDance AI Lab) — NeurIPS'24
Boyang Xia (CAS → Kuaishou) — 2× ECCV'22
Haosen Yang (HIT → PhD, Univ. of Surrey) — AAAI'22
Deng Huang (SCUT → AutoX) — ICCV'21
Yuxiang Zhao (CAS → PhD, PKU) — ACMMM'21, CVPR'21 AI City Winner 🏆
Haipeng Luo (CAS → PhD, Tsinghua) — CVPR'23
Guangzhao Dai (PhD, NJUST) — TMM'24

Collaborators & Friends

Xiaohan Wang (Stanford) · Xiao Tan (Baidu) · Dongliang He (ByteDance) · Tianwei Lin (Horizon) · Yanwu Xu (Boston Univ.) · Jie Wu (ByteDance) · Jin Ye (Monash) · Chuang Gan (MIT-IBM Watson) · Yihao Liu (Shanghai AI Lab) · Zhun Sun (Tohoku Univ.) · Mingde Yao (CUHK) · Min Yang (ByteDance)

Off the Clock

I'm honestly a pretty low-key person. Fittingly for someone who works on video understanding, most of my downtime goes to… watching videos 📺 — variety & stand-up shows, Chinese TV dramas (iQIYI / Youku / Tencent Video), movies, and anime, from the classics (Detective Conan, Dragon Ball) to Chinese ones like Biao Ren (镖人) and A Record of a Mortal's Journey to Immortality (凡人修仙传) — both of which update painfully slowly, so I'm perpetually waiting for the next episode. 😅 Plus an unhealthy amount of Douyin and Bilibili, and yes, I pay for memberships on basically every streaming platform.

Back in China I loved eating my way through the food 🍜 and traveled across most of its provinces. In the US, with good Chinese food harder to find, my evenings are mostly videos, and my mornings often start with the stock market 📈 (a firm believer in the LEAPS + sell-put strategy, with returns that swing more wildly than my training loss curves). My Switch and PS5 🎮 are gathering dust these days — mostly I just take after-dinner walks in the park with my wife. 🌳 By the standards of AI and academia, I married and started a family early — a baby before 30. 👶 I'm also pretty lazy and never work out, yet somehow stay slim thanks to good genes (my dad's the same). 🤷