Wenhao Wu (吴文灏)  

知乎 github Google Scholar LinkedIn X Semantic Scholar ORCID

 

Ph.D. Candidate, School of Computer Science

The University of Sydney, Darlington, Australia

Personal Email : whwu.ucas (at) gmail.com

About Me

I'm a Ph.D. student at MMLab@USYD, The University of Sydney, under the supervision of Prof. Wanli Ouyang and Prof. Chang Xu.
Prior to this, I was a senior researcher at Baidu VIS, where I worked closely with Dr. Jingdong Wang (IEEE Fellow).
Previously, I received my M.S.E degree from MMLab@SIAT, University of Chinese Academy of Sciences (UCAS), supervised by Prof. Shifeng Chen and Prof. Yu Qiao.
I was also fortunate to spend time at MMLab@CUHK, Baidu, iQIYI, SenseTime, Samsung and Chinese Academy of Sciences.
I am honored to be awarded the 11th Baidu Scholarship (2023).

If interested in collaboration or discussion, please email me.

Research Interest

My research interests broadly lie in the areas of Computer Vision and Deep Learning, current and previous focal areas include: I've dedicated myself to advanced AI research across these fields, leading to publications in top-tier conferences. As my tech expertise deepened, I now focus less on paper quantity and more on rethinking problems and offering simple, effective solutions.

Updates

  • New 01/2024: 🎖 I'm honored to be among the 10 PhD students globally awarded the 11th Baidu Scholarship, a prestigious fellowship in Artificial Intelligence, providing 200,000 RMB (about $30,000) to selectees from thousands of applicants. . [第11届百度奖学金揭晓, 全球10人全员95后]
  • New 11/2023: We release GPT4Vis , which provides a Quantitative Evaluation of GPT-4 (😭Running once roughly costs 💰$4000💰) for Visual Understanding across images, videos and point clouds, spinning on 16 datasets.
  • New 11/2023: We release Side4Video , a Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning, which significantly reduces the training memory cost for action recognition (↓75%) and text-video retrieval (↓30%).
  • New 08/2023: The extension of Text4Vis has been accepted by IJCV.
  • 07/2023: Two First-author papers (Temporal Modeling: ATM , Cross-Modal Retrieval: UA ) are accepted by ICCV2023.
  • 02/2023: Two First-author papers for video understanding (BIKE , Cap4Video ) are accepted by CVPR 2023. Cap4Video involves GPT to enhance text-video learning, is selected as a Highlight paper (Top 2.5%).
  • 11/2022: Two papers (Video Recognition: Text4Vis , Style Transfer: AdaCM) are accepted by AAAI 2023.
  • 07/2022: Three papers (Video Sampling: NSNet, TSQNet, Cross-Modal Learning: CODER) are accepted by ECCV 2022.
  • 06/2022: Our MaMiCo, a new video self-supervised learning work, is accepted by ACMMM 2022 (Oral Presentation).
  • 03/2022: Two low-level vision papers (MSPC, BAIRNet) are accepted by CVPR 2022.
  • 12/2021: Our BCNet , an general temporal localization framework, is accepted by AAAI 2022.
  • 07/2021: Our ASCNet, a self-supervised video representation learning framework, is accepted by ICCV 2021.
  • 07/2021: Two papers (Video Recognition, Crowd Counting) are accepted by ACMMM 2021.
  • 04/2021: We present a novel task: Weakly-Supervised Spatio-Temporal Anomaly Detection, is accepted by IJCAI 2021.
  • 04/2021: Winner in the Traffic Anomaly Detection Track of the CVPR 2021 AI CITY CHALLENGE.
  • 12/2020: Our MVFNet , an efficient temporal module, is accepted by AAAI 2021.
  • 07/2020: Our ADD-GCN for multi-label image recognition, is accepted by ECCV 2020.
  • 05/2020: One dynamic video inference paper is accepted for Oral Presentation on CVPR2020 EDLCV workshop.
  • 07/2019: My first paper MARL, a novel video sampler, is accepted as Oral Presentation (Top 4%) on ICCV 2019.
  • 09/2017: Recommended to University of Chinese Academy of Sciences towards the Master degree with exam exemption.
  • 06/2017: Graduated from Central South University the Outstanding Graduate Honor.
  • 10/2016: Joined MMLab@SIAT as research intern. Started doing research on Computer Vision.

Industrial Experiences

Baidu Research Baidu VIS

Intern → Senior Researcher → Intern on Video Understanding & Editing
worked with Dr. Jingdong Wang (IEEE Fellow) and Dr. Errui Ding
Oct. 2018 - Present. Shenzhen / Beijing / Hybrid

SenseTime Research

Research Intern in BigVideo Team
worked with Dr. Kai Chen
Jan. 2020 - Feb. 2020. Shenzhen, China

iQIYI

Research Intern in Video Analysis Group
hosted by Qiyue Liu
Jun. 2018 - Oct. 2018. Beijing, China

Samsung Research China

Research Intern in Machine Learning Lab
hosted by Zhenbo luo
Mar. 2018 - Jun. 2018. Beijing, China

Education & Visiting

The University of Sydney, Australia

Doctor of Philosophy (Ph.D.) Candidate in Computer Science
Team: Multimedia Laboratory, Sydney (MMLab@Sydney)
Advisor: Prof. Wanli Ouyang, Prof. Chang Xu
2022 - Present

The Chinese University of Hong Kong, Hong Kong

Honorary Research Assistant in Multimedia Laboratory, CUHK (MMLab@CUHK)
Advisor: Prof. Wanli Ouyang
2023 - 2024

University of Chinese Academy of Sciences, China

Master of Science in Engineering in Pattern Recognition & Intelligent System
Team: Multimedia Laboratory, SIAT, CAS (MMLab@SIAT)
Advisor: Prof. Shifeng Chen, Prof. Yu Qiao
2017 - 2020

Central South University, China

Bachelor of Engineering in Automation
Advisor: Prof. Shifeng Chen, Prof. Yu Qiao while visiting CAS (Oct. 2016 - Jun. 2017)
2013 - 2017

Selected Publications [ Full List ]

( *Co-first Author, *Correspondence)
GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?
Wenhao Wu, Huanjin Yao, Mengxi Zhang, Yuxin Song, Wanli Ouyang, Jingdong Wang
Technical Report, 2023
[ PDF ] [ Code ] [ My Blog (Chinese) ]
We provide a Quantitative Evaluation of GPT-4 for Visual Understanding across images, videos and point clouds, spinning on 16 popular datasets.
(😭Running all tests once roughly costs 💰$4000+💰)
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
Huanjin Yao, Wenhao Wu**, Zhiheng Li
Technical Report, 2023
[ PDF ] [ Code ]
Side4Video significantly reduces the training memory cost for action recognition (↓75%) and text-video retrieval (↓30%).
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Wenhao Wu, Yuxin Song, Zhun Sun, Jingdong Wang, Chang Xu, Wanli Ouyang
IEEE International Conference on Computer Vision (ICCV), 2023
[ PDF ] [ Code ]
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Wenhao Wu, Haipeng Luo, Bo Fang, Jingdong Wang, Wanli Ouyang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2023
[Highlight, Top 2.5% of 9155 submissions] [ PDF ] [ Code ]
Cap4Video leverages auxiliary captions generated by GPT to enhance cross-modal learning.
UATVR: Uncertainty-Adaptive Text-Video Retrieval
Bo Fang*, Wenhao Wu*, Chang Liu, Yu Zhou, Yuxin Song, Weiping Wang, Xiangbo Shu, Xiangyang Ji, Jingdong Wang
IEEE International Conference on Computer Vision (ICCV), 2023
[ PDF ] [ Code ]
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2023
[ PDF ] [ Code ]
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu, Zhun Sun, Wanli Ouyang
The AAAI Conference on Artificial Intelligence (AAAI) , 2023
[ PDF ] [ Code ] [ Poster ] [ Slides ] [ Video ]

Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective
Wenhao Wu, Zhun Sun, Yuxin Song, Jingdong Wang, Wanli Ouyang
International Journal of Computer Vision (IJCV), 2023 Impact factor: 19.5
[ PDF ]
We revisit the classifier with the textual embeddings, and achieve SOTA performance on Full-supervision/Few-shot/Zero-shot recognition.
NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition
Boyang Xia*, Wenhao Wu**, Haoran Wang, Rui Su, Dongliang He, Haosen Yang, Xiaoran Fan, Wanli Ouyang
European Conference on Computer Vision (ECCV) , 2022
[ PDF ] [ Project ]
A sampler with a 4x faster practical speed than SOTA methods.
Temporal Saliency Query Network for Efficient Video Recognition
Boyang Xia*, Zhihao Wang*, Wenhao Wu*, Haoran Wang, Jungong Han
European Conference on Computer Vision (ECCV) , 2022
[ PDF ] [ Project ]
TSQNet, the first work to model temporal sampling as a query-response task.
MaMiCo: Macro-to-Micro Semantic Correspondence for Self-supervised Video Representation Learning
Bo Fang*, Wenhao Wu*, Chang Liu*, Yu Zhou, Dongliang He, Weiping Wang
ACM International Conference on Multimedia (ACMMM) , 2022
[Oral, 5.0% acceptance rate] [ PDF ] [ Project ]
MaMiCo, a self-supervised Macro-to-Micro Semantic Correspondence learning framework for video representation learning.
Temporal Action Proposal Generation with Background Constraint
Haosen Yang*, Wenhao Wu*, Lining Wang, Sheng Jin, Boyang Xia, Hongxun Yao, Hujie Huang
The AAAI Conference on Artificial Intelligence (AAAI) , 2022
[15% acceptance rate] [ PDF ] [ Code ]
BCNet, an general framework for effective Temporal Action Proposal Generation.
ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency
Deng Huang*, Wenhao Wu*, Weiwen Hu, Xu Liu, Dongliang He, Zhihua Wu, Xiangmiao Wu, Mingkui Tan, Errui Ding
IEEE International Conference on Computer Vision (ICCV) , 2021
[ PDF ] [ Poster ] [ Slides ] [ Video ] [ Code ]
An effective self-supervised video representation learning framework.
DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning
Wenhao Wu*, Yuxiang Zhao*, Yanwu Xu, Xiao Tan, Dongliang He, Zhikang Zou, Jin Ye, Yingying Li, Mingde Yao, Zichao Dong, Yifeng Shi
ACM International Conference on Multimedia (ACMMM) , 2021
[ PDF ] [ Poster ] [ Slides ] [ Code ]
An efficient plug-and-play module for effective video-level representation learning.
MVFNet: Multi-View Fusion Network for Efficient Video Recognition
Wenhao Wu, Dongliang He, Tianwei Lin, Fu Li, Chuang Gan, Errui Ding
The AAAI Conference on Artificial Intelligence (AAAI) , 2021
[ PDF ] [ Poster ] [ Slides ] [ Code ] [ Bibtex ]
          @inproceedings{wu2021mvfnet,
          title={Mvfnet: Multi-view fusion network for efficient video recognition},
          author={Wu, Wenhao and He, Dongliang and Lin, Tianwei and Li, Fu and Gan, Chuang and Ding, Errui},
          booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
          volume={35},
          number={4},
          pages={2943--2951},
          year={2021}
          }
              
An efficient architecture for video recognition based on 2D CNN.
Good Practices and A Strong Baseline for Traffic Anomaly Detection
Yuxiang Zhao*, Wenhao Wu*, Yue He, Yingying Li, Xiao Tan, Shifeng Chen
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) - 5th AI City Challenge (AICity), 2021
[ PDF ] [ Code ] [ Bibtex ]
        @inproceedings{zhao2021good,
        title={Good Practices and A Strong Baseline for Traffic Anomaly Detection},
        author={Zhao, Yuxiang and Wu, Wenhao and He, Yue and Li, Yingying and Tan, Xiao and Chen, Shifeng},
        booktitle={Proceedings of CVPR Workshops},
        year={2021}
        }
          
Winner of AI City challenge for traffic anomaly detection
Dynamic Inference: A New Approach Toward Efficient Video Action Recognition
Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Yi Yang, Shilei Wen
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) - Joint Workshop on Efficient Deep Learning in Computer Vision (EDLCV), 2020
[Oral] [ PDF ] [ Slides ] [ Bibtex ]
        @inproceedings{wu2020dynamic,
            title={Dynamic Inference: A New Approach Toward Efficient Video Action Recognition},
            author={Wu, Wenhao and He, Dongliang and Tan, Xiao and Chen, Shifeng 
              and Yang, Yi and Wen, Shilei},
            booktitle={Proceedings of CVPR Workshops},
            pages={676--677},
            year={2020}
        }
            
Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition
Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Shilei Wen
IEEE International Conference on Computer Vision (ICCV), 2019
[Oral, 4.3% acceptance rate] [ PDF ] [ Poster ] [ Slides ] [ Bibtex ]
@inproceedings{wu2019multi,
    title={Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed
       Video Recognition},
    author={Wu, Wenhao and He, Dongliang and Tan, Xiao and Chen, Shifeng and Wen, Shilei},
    booktitle={Proceedings of the IEEE International Conference on Computer Vision},
    pages={6222--6231},
    year={2019}
}

Contests

  • CVPR2021 AI CITY Challenge: Traffic Anomaly Detection, Winner Award, 2021
  • CVPR2021 NTIRE Challenge on Image Deblurring: Track 2 JPEG Artifacts, Runner-Up Award, 2021
  • The Meritorious Winner in the America Mathematical Contest in Modeling (MCM), 2016
  • The First-class Prize in National Undergraduate Mechanical Innovation Design Competition, 2016
  • The Second-class Prize in China Freescale Cup Intelligent Car Competition (South China Region), 2015
  • The Second-class Prize in Smart Car Racing Competition of Hunan Province, 2015

Awards

PhD:

  • 11th Baidu Scholarship (only 10 winners worldwide, 200,000 RMB), 2023
  • Australian Good Design Award, 2022
  • Faculty of Engineering Research Scholarship at USYD (Full Scholarship), 2022-2026
  • Master:

  • Outstanding Student of University of Chinese Academy of Sciences, 2020
  • Baidu Outstanding Intern, 2019
  • Scholarship for Academic Excellence of Shenzhen Institutes of Advanced Technology, CAS, 2018
  • University of Chinese Academy of Sciences (UCAS) Postgraduate Scholarship (Full Scholarship), 2017-2020
  • Undergraduate:

  • Graduate with Honour of Central South University (Top 5%), 2017
  • National Inspirational Scholarship (Top 5%) awarded by the Ministry of Education, 2016
  • Excellent National College Students Innovation and Entrepreneurship Project (20,000 RMB), 2016
  • Outstanding Student Leader, Outstanding Student of Central South University, 2013-2017
  • Scholarship for Academic Excellence of Central South University, 2013-2017
  • Academic Activities

    Journal Reviewer

  • IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
  • IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
  • IEEE Transaction on Image Processing (TIP)
  • IEEE Transaction on Circuits and Systems for Video Technology (TCSVT)
  • IEEE Transactions on Multimedia (TMM)
  • Computer Vision and Image Understanding (CVIU)
  • IEEE Transactions on Biomedical Engineering (TBME)
  • Knowledge-Based Systems
  • International Journal of Multimedia Information Retrieval (IJMIR)
  • Conference PC Member/Reviewer

  • Reviewer, The Conference on Computer Vision and Pattern Recognition (CVPR), 2021, 2022, 2023, 2024
  • Reviewer, International Conference on Computer Vision (ICCV), 2023
  • Reviewer, European Conference on Computer Vision (ECCV), 2022, 2024
  • PC Member, International Joint Conference on Artificial Intelligence (IJCAI), 2021
  • PC Member, The AAAI Conference on Artifical Intelligence (AAAI), 2021, 2022
  • Reviewer, ACM International Conference on Multimedia (ACMMM), 2023, 2024
  • Reviewer, Winter Conference on Applications of Computer Vision (WACV), 2022
  • Reviewer, International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2022
  • Member of IEEE, ACM, AAAI and CVF

    Current/Past Mentoring

    I'm glad and lucky to advise and collaborate with:


    Huanjin Yao (Master, Tsinghua University), 2023 ↝ Now
    Bo Fang (Master, CAS → PhD, CityU), 2021 ↝ 2023
    Haipeng Luo (Master, CAS → PhD, Tsinghua University), 2022 ↝ 2023
    Yuguo Wang (Master, Duke University), 2022
    Zhihao Wang (Master, CAS), 2022
    Boyang Xia (Master, CAS → Kuaishou), 2021 ↝ 2022
    Haosen Yang (PhD, University of Surrey), 2021
    Deng Huang (Master, SCUT → AutoX), 2020 ↝ 2021
    Hengyuan Zhao (PhD, NUS), 2021
    Yuxiang Zhao (Master, CAS → PhD, Peking University), 2020 ↝ 2021

    Collaborators & Friends

    Xiaohan Wang (Stanford University), Xiao Tan (Baidu), Dongliang He (ByteDance), Tianwei Lin (Horizon), Yanwu Xu (Boston University), Jie Wu (ByteDance), Jin Ye (Monash University), Chuang Gan (MIT-IBM Watson Lab), Yihao Liu (Shanghai Lab), Chang Liu (Tsinghua University), Zhun Sun (Tencent)

    Last Updated on 14th January, 2024

    Published with GitHub Pages