Wenhao Wu (吴文灏)  

知乎 github Google Scholar LinkedIn X Semantic Scholar ORCID

 

Ph.D. Student

School of Computer Science, Faculty of Engineering

The University of Sydney

Personal Email : whwu.ucas (at) gmail.com

About Me

I'm a second-year Ph.D. student at The University of Sydney, supervised by Prof. Wanli Ouyang and Prof. Chang Xu. Prior to this, I was a senior researcher at Baidu VIS, where I worked closely with Dr. Jingdong Wang (IEEE Fellow). Previously, I received my M.S.E degree from University of Chinese Academy of Sciences (UCAS), supervised by Prof. Shifeng Chen and Prof. Yu Qiao.
I am fortunate to have industrial internship experience, including Amazon AI, Baidu AI, iQIYI, SenseTime and Samsung Research.
I am/was a member of academic institutions such as MMLab@USYD, MMLab@CUHK, and MMLab@SIAT-CAS.

If interested in collaboration or discussion, please email me.

Research Interest

My research interests broadly lie in the areas of Computer Vision and Deep Learning, current and previous focal areas include: I've dedicated myself to advanced AI research across these fields, leading to publications in top-tier conferences. As my tech expertise deepened, I now focus less on paper quantity and more on rethinking problems and offering simple, effective solutions.

Updates

  • New 11/2023: We release GPT4Vis , which provides a Quantitative Evaluation of GPT-4 (😭Running once roughly costs 💰$4000💰) for Visual Understanding across images, videos and point clouds, spinning on 16 datasets.
  • New 11/2023: We release Side4Video , a Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning, which significantly reduces the training memory cost for action recognition (↓75%) and text-video retrieval (↓30%).
  • New 08/2023: The extension of Text4Vis has been accepted by IJCV.
  • 07/2023: Two First-author papers (Temporal Modeling: ATM , Cross-Modal Retrieval: UA ) are accepted by ICCV2023.
  • 02/2023: Two First-author papers for video understanding (BIKE , Cap4Video ) are accepted by CVPR 2023. Cap4Video involves GPT to enhance text-video learning, is selected as a Highlight paper (Top 2.5%).
  • 11/2022: Two papers (Video Recognition: Text4Vis , Style Transfer: AdaCM) are accepted by AAAI 2023.
  • 07/2022: Three papers (Video Sampling: NSNet, TSQNet, Cross-Modal Learning: CODER) are accepted by ECCV 2022.
  • 06/2022: Our MaMiCo, a new video self-supervised learning work, is accepted by ACMMM 2022 (Oral Presentation).
  • 03/2022: Two low-level vision papers (MSPC, BAIRNet) are accepted by CVPR 2022.
  • 12/2021: Our BCNet , an general temporal localization framework, is accepted by AAAI 2022.
  • 07/2021: Our ASCNet, a self-supervised video representation learning framework, is accepted by ICCV 2021.
  • 07/2021: Two papers (Video Recognition, Crowd Counting) are accepted by ACMMM 2021.
  • 04/2021: We present a novel task: Weakly-Supervised Spatio-Temporal Anomaly Detection, is accepted by IJCAI 2021.
  • 04/2021: Winner in the Traffic Anomaly Detection Track of the CVPR 2021 AI CITY CHALLENGE.
  • 12/2020: Our MVFNet , an efficient temporal module, is accepted by AAAI 2021.
  • 07/2020: Our ADD-GCN for multi-label image recognition, is accepted by ECCV 2020.
  • 05/2020: One dynamic video inference paper is accepted for Oral Presentation on CVPR2020 EDLCV workshop.
  • 07/2019: My first paper MARL, a novel video sampler, is accepted as Oral Presentation (Top 4%) on ICCV 2019.
  • 09/2017: Recommended to University of Chinese Academy of Sciences towards Master degree with exam exemption (保送研究生).
  • 06/2017: Graduated from Central South University with Outstanding Graduate Honor.
  • 10/2016: Joined MMLab@SIAT as research intern. Started working on Computer Vision.

Education & Visiting

The University of Sydney, Australia

Doctor of Philosophy (Ph.D.) Candidate in Computer Science
Team: Multimedia Laboratory (MMLab@USYD)
Advisor: Prof. Wanli Ouyang, Prof. Chang Xu
2022 - 2026

The Chinese University of Hong Kong, Hong Kong

Honorary Research Assistant in Multimedia Laboratory (MMLab@CUHK)
Advisor: Prof. Wanli Ouyang
2023 - 2024

University of Chinese Academy of Sciences, China

Master of Science in Engineering in Pattern Recognition & Intelligent System
Team: Multimedia Laboratory, SIAT, CAS (MMLab@SIAT-CAS)
Advisor: Prof. Shifeng Chen, Prof. Yu Qiao
2017 - 2020 with exam exemption (保送研究生)

Central South University, China

Bachelor of Engineering in Automation
Advisor: Prof. Shifeng Chen, Prof. Yu Qiao while visiting CAS (Oct. 2016 - Jun. 2017)
2013 - 2017

Selected Publications [ Full List ]

( *Co-first Author, *Correspondence)
GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?
Wenhao Wu, Huanjin Yao, Mengxi Zhang, Yuxin Song, Wanli Ouyang, Jingdong Wang
Technical Report, 2023
[ PDF ] [ Code ] [ My Blog (Chinese) ]
We provide a Quantitative Evaluation of GPT-4 for Visual Understanding across images, videos and point clouds, spinning on 16 popular datasets.
(😭Running all tests once roughly costs 💰$4000+💰)
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
Huanjin Yao, Wenhao Wu**, Zhiheng Li
Technical Report, 2023
[ PDF ] [ Code ]
Side4Video significantly reduces the training memory cost for action recognition (↓75%) and text-video retrieval (↓30%).
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Wenhao Wu, Yuxin Song, Zhun Sun, Jingdong Wang, Chang Xu, Wanli Ouyang
IEEE International Conference on Computer Vision (ICCV), 2023
[ PDF ] [ Code ]
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Wenhao Wu, Haipeng Luo, Bo Fang, Jingdong Wang, Wanli Ouyang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2023
[Highlight, Top 2.5% of 9155 submissions] [ PDF ] [ Code ]
Cap4Video leverages auxiliary captions generated by GPT to enhance cross-modal learning.
UATVR: Uncertainty-Adaptive Text-Video Retrieval
Bo Fang*, Wenhao Wu*, Chang Liu, Yu Zhou, Yuxin Song, Weiping Wang, Xiangbo Shu, Xiangyang Ji, Jingdong Wang
IEEE International Conference on Computer Vision (ICCV), 2023
[ PDF ] [ Code ]
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2023
[ PDF ] [ Code ]
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu, Zhun Sun, Wanli Ouyang
The AAAI Conference on Artificial Intelligence (AAAI) , 2023
[ PDF ] [ Code ] [ Poster ] [ Slides ] [ Video ]

Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective
Wenhao Wu, Zhun Sun, Yuxin Song, Jingdong Wang, Wanli Ouyang
International Journal of Computer Vision (IJCV), 2023 Impact factor: 19.5
[ PDF ]
We revisit the classifier with the textual embeddings, and achieve SOTA performance on Full-supervision/Few-shot/Zero-shot recognition.
NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition
Boyang Xia*, Wenhao Wu**, Haoran Wang, Rui Su, Dongliang He, Haosen Yang, Xiaoran Fan, Wanli Ouyang
European Conference on Computer Vision (ECCV) , 2022
[ PDF ] [ Project ]
A sampler with a 4x faster practical speed than SOTA methods.
Temporal Saliency Query Network for Efficient Video Recognition
Boyang Xia*, Zhihao Wang*, Wenhao Wu*, Haoran Wang, Jungong Han
European Conference on Computer Vision (ECCV) , 2022
[ PDF ] [ Project ]
TSQNet, the first work to model temporal sampling as a query-response task.
MaMiCo: Macro-to-Micro Semantic Correspondence for Self-supervised Video Representation Learning
Bo Fang*, Wenhao Wu*, Chang Liu*, Yu Zhou, Dongliang He, Weiping Wang
ACM International Conference on Multimedia (ACMMM) , 2022
[Oral, 5.0% acceptance rate] [ PDF ] [ Project ]
MaMiCo, a self-supervised Macro-to-Micro Semantic Correspondence learning framework for video representation learning.
Temporal Action Proposal Generation with Background Constraint
Haosen Yang*, Wenhao Wu*, Lining Wang, Sheng Jin, Boyang Xia, Hongxun Yao, Hujie Huang
The AAAI Conference on Artificial Intelligence (AAAI) , 2022
[15% acceptance rate] [ PDF ] [ Code ]
BCNet, an general framework for effective Temporal Action Proposal Generation.
ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency
Deng Huang*, Wenhao Wu*, Weiwen Hu, Xu Liu, Dongliang He, Zhihua Wu, Xiangmiao Wu, Mingkui Tan, Errui Ding
IEEE International Conference on Computer Vision (ICCV) , 2021
[ PDF ] [ Poster ] [ Slides ] [ Video ] [ Code ]
An effective self-supervised video representation learning framework.
DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning
Wenhao Wu*, Yuxiang Zhao*, Yanwu Xu, Xiao Tan, Dongliang He, Zhikang Zou, Jin Ye, Yingying Li, Mingde Yao, Zichao Dong, Yifeng Shi
ACM International Conference on Multimedia (ACMMM) , 2021
[ PDF ] [ Poster ] [ Slides ] [ Code ]
An efficient plug-and-play module for effective video-level representation learning.
MVFNet: Multi-View Fusion Network for Efficient Video Recognition
Wenhao Wu, Dongliang He, Tianwei Lin, Fu Li, Chuang Gan, Errui Ding
The AAAI Conference on Artificial Intelligence (AAAI) , 2021
[ PDF ] [ Poster ] [ Slides ] [ Code ] [ Bibtex ]
          @inproceedings{wu2021mvfnet,
          title={Mvfnet: Multi-view fusion network for efficient video recognition},
          author={Wu, Wenhao and He, Dongliang and Lin, Tianwei and Li, Fu and Gan, Chuang and Ding, Errui},
          booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
          volume={35},
          number={4},
          pages={2943--2951},
          year={2021}
          }
              
An efficient architecture for video recognition based on 2D CNN.
Good Practices and A Strong Baseline for Traffic Anomaly Detection
Yuxiang Zhao*, Wenhao Wu*, Yue He, Yingying Li, Xiao Tan, Shifeng Chen
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) - 5th AI City Challenge (AICity), 2021
[ PDF ] [ Code ] [ Bibtex ]
        @inproceedings{zhao2021good,
        title={Good Practices and A Strong Baseline for Traffic Anomaly Detection},
        author={Zhao, Yuxiang and Wu, Wenhao and He, Yue and Li, Yingying and Tan, Xiao and Chen, Shifeng},
        booktitle={Proceedings of CVPR Workshops},
        year={2021}
        }
          
Winner of AI City challenge for traffic anomaly detection
Dynamic Inference: A New Approach Toward Efficient Video Action Recognition
Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Yi Yang, Shilei Wen
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) - Joint Workshop on Efficient Deep Learning in Computer Vision (EDLCV), 2020
[Oral] [ PDF ] [ Slides ] [ Bibtex ]
        @inproceedings{wu2020dynamic,
            title={Dynamic Inference: A New Approach Toward Efficient Video Action Recognition},
            author={Wu, Wenhao and He, Dongliang and Tan, Xiao and Chen, Shifeng 
              and Yang, Yi and Wen, Shilei},
            booktitle={Proceedings of CVPR Workshops},
            pages={676--677},
            year={2020}
        }
            
Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition
Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Shilei Wen
IEEE International Conference on Computer Vision (ICCV), 2019
[Oral, 4.3% acceptance rate] [ PDF ] [ Poster ] [ Slides ] [ Bibtex ]
@inproceedings{wu2019multi,
    title={Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed
       Video Recognition},
    author={Wu, Wenhao and He, Dongliang and Tan, Xiao and Chen, Shifeng and Wen, Shilei},
    booktitle={Proceedings of the IEEE International Conference on Computer Vision},
    pages={6222--6231},
    year={2019}
}

Contests

  • CVPR2021 AI CITY Challenge: Traffic Anomaly Detection, Winner Award, 2021
  • CVPR2021 NTIRE Challenge on Image Deblurring: Track 2 JPEG Artifacts, Runner-Up Award, 2021
  • The Meritorious Winner in the America Mathematical Contest in Modeling (MCM), 2016
  • The First-class Prize in National Undergraduate Mechanical Innovation Design Competition, 2016
  • The Second-class Prize in China Freescale Cup Intelligent Car Competition (South China Region), 2015
  • The Second-class Prize in Smart Car Racing Competition of Hunan Province, 2015

Awards

PhD:

  • Australian Good Design Award, 2022
  • Faculty of Engineering Research Scholarship at USYD (Full Scholarship), 2022-2026
  • Master:

  • Outstanding Student of University of Chinese Academy of Sciences, 2020
  • Baidu Outstanding Intern, 2019
  • Scholarship for Academic Excellence of Shenzhen Institutes of Advanced Technology, CAS, 2018
  • University of Chinese Academy of Sciences (UCAS) Postgraduate Scholarship (Full Scholarship), 2017-2020
  • Undergraduate:

  • Graduate with Honour of Central South University (Top 5%), 2017
  • National Inspirational Scholarship (Top 5%) awarded by the Ministry of Education, 2016
  • Excellent National College Students Innovation and Entrepreneurship Project (20,000 RMB), 2016
  • Outstanding Student Leader, Outstanding Student of Central South University, 2013-2017
  • Scholarship for Academic Excellence of Central South University, 2013-2017
  • Academic Activities

    Journal Reviewer

  • IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
  • IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
  • IEEE Transaction on Image Processing (TIP)
  • IEEE Transaction on Circuits and Systems for Video Technology (TCSVT)
  • IEEE Transactions on Multimedia (TMM)
  • Computer Vision and Image Understanding (CVIU)
  • IEEE Transactions on Biomedical Engineering (TBME)
  • Knowledge-Based Systems
  • International Journal of Multimedia Information Retrieval (IJMIR)
  • IEEE Intelligent Transportation Systems Transactions (TITS)
  • Conference PC Member/Reviewer

  • Reviewer, The Conference on Computer Vision and Pattern Recognition (CVPR), 2021, 2022, 2023, 2024
  • Reviewer, International Conference on Computer Vision (ICCV), 2023
  • Reviewer, European Conference on Computer Vision (ECCV), 2022, 2024
  • PC Member, International Joint Conference on Artificial Intelligence (IJCAI), 2021
  • PC Member, The AAAI Conference on Artifical Intelligence (AAAI), 2021, 2022
  • Reviewer, ACM International Conference on Multimedia (ACMMM), 2023, 2024
  • Reviewer, Winter Conference on Applications of Computer Vision (WACV), 2022
  • Reviewer, International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2022
  • Member of IEEE, ACM, AAAI and CVF

    Off-Campus Mentor of Tsinghua University

    Current/Past Mentoring

    I'm glad and lucky to advise and collaborate with:


    Huanjin Yao (Master, Tsinghua University), 2023 ↝ Now
    Bo Fang (Master, CAS → PhD, CityU), 2021 ↝ 2023
    Haipeng Luo (Master, CAS → PhD, Tsinghua University), 2022 ↝ 2023
    Yuguo Wang (Master, Duke University), 2022
    Zhihao Wang (Master, CAS), 2022
    Boyang Xia (Master, CAS → Kuaishou), 2021 ↝ 2022
    Haosen Yang (PhD, University of Surrey), 2021
    Deng Huang (Master, SCUT → AutoX), 2020 ↝ 2021
    Hengyuan Zhao (PhD, NUS), 2021
    Yuxiang Zhao (Master, CAS → PhD, Peking University), 2020 ↝ 2021

    Collaborators & Friends

    Xiaohan Wang (Stanford University), Xiao Tan (Baidu), Dongliang He (ByteDance), Tianwei Lin (Horizon), Yanwu Xu (Boston University), Jie Wu (ByteDance), Jin Ye (Monash University), Chuang Gan (MIT-IBM Watson Lab), Yihao Liu (Shanghai Lab), Chang Liu (Tsinghua University), Zhun Sun (Tencent)

    Last Updated on 14th January, 2024

    Published with GitHub Pages