Publications

Conferences: ICCV (4), CVPR (4), ECCV (5), NeurIPS (2), AAAI (4), IJCAI (1), ACMMM (3) -- Oral x2, Highlight x1

Journals: TPAMI (1), IJCV (1), TMM (1)

( *Co-first Author, *Corresponding Author)


Preprints:


  1. Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
    Huanjin Yao, Jiaxing Huang, Wenhao Wu, Jingyi Zhang, Yibo Wang, Shunyu Liu, Yingjie Wang, Yuxin Song, Haocheng Feng, Li Shen, Dacheng Tao
    Technical Report, ArXiv:2412.18319   [ PDF ] [ Code ]
  2. FreeVA: Offline MLLM as Training-Free Video Assistant
    Wenhao Wu
    Technical Report, ArXiv:2405.07798   [ PDF ] [ Code ]
  3. GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?
    Wenhao Wu, Huanjin Yao, Mengxi Zhang, Yuxin Song, Wanli Ouyang, Jingdong Wang
    Technical Report, ArXiv:2311.15732   [ PDF ] [ Code ]
  4. Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
    Huanjin Yao*, Wenhao Wu**, Zhiheng Li
    Technical Report, ArXiv:2311.15769   [ PDF ] [ Code ]
  5. It Takes Two: Masked Appearance-Motion Modeling for Self-supervised Video Transformer Pre-training
    Yuxin Song, Min Yang, Wenhao Wu, Dongliang He, Fu Li, Jingdong Wang
    Technical Report, ArXiv:2210.05234   [ PDF ]
  6. Discovering “Semantics” in Super-Resolution Networks
    Yihao Liu, Anran Liu, Jinjin Gu, Zhipeng Zhang, Wenhao Wu, Yu Qiao, Chao Dong
    Technical Report, ArXiv:2108.00406   [ PDF ] [ Code ]
  7. Color2Style: Real-Time Exemplar-Based Image Colorization with Self-Reference Learning and Deep Feature Modulation
    Henyuan Zhao*, Wenhao Wu*, Yihao Liu, Dongliang He
    Technical Report, ArXiv:2106.08017   [ PDF ] [ Code ]
  8. Temporal Action Proposal Generation with Transformers
    Lining Wang*, Haosen Yang*, Wenhao Wu*, Hongxun Yao, Hujie Huang
    Technical Report, ArXiv:2105.12043   [ PDF ]

  9. Journal Papers:


    1. Cap4Video++: Enhancing Video Understanding with Auxiliary Captions
      Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang
      Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024.   Impact factor: 23.6   [ PDF ] [ Code ]
    2. Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective
      Wenhao Wu, Zhun Sun, Yuxin Song, Jingdong Wang, Wanli Ouyang
      International Journal of Computer Vision (IJCV), 2023.   Impact factor: 19.5   [ PDF ] [ Code ]
    3. GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition
      Guangzhao Dai, Xiangbo Shu, Wenhao Wu
      Transactions on Multimedia, 2024   [ PDF ]
    4. Rethinking 3D cost aggregation in stereo matching
      Wanshui Gan, Wenhao Wu, Shifeng Chen, Yuxiang Zhao, Pak Kin Wong
      Pattern Recognition Letters, 2023   [ PDF ] [ Code ]

    Conference Papers:


    1. Dense Connector for MLLMs
      Huanjin Yao*, Wenhao Wu**, Taojiannan Yang, Yuxin Song, Mengxi Zhang, Haocheng Feng, Yifan Sun, Zhiheng Li, Wanli Ouyang, Jingdong Wang
      NeurIPS 2024   [ PDF ] [ Code ]
    2. Automated Multi-level Preference for MLLMs
      Mengxi Zhang, Wenhao Wu, Yu Lu, Yuxin Song, Kang Rong, Huanjin Yao, Jianbo Zhang, Fanglong Liu, Yifan Sun, Haocheng Feng, Jingdong Wang
      NeurIPS 2024   [ PDF ] [ Code ]
    3. DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
      Yixuan Wu, Yizhou Wang, Shixiang Tang, Wenhao Wu, Tong He, Wanli Ouyang, Jian Wu, Philip Torr
      ECCV 2024   [ PDF ]
    4. What Can Simple Arithmetic Operations Do for Temporal Modeling?
      Wenhao Wu, Yuxin Song, Zhun Sun, Jingdong Wang, Chang Xu, Wanli Ouyang
      ICCV 2023   [ PDF ] [ Code ]
    5. UATVR: Uncertainty-Adaptive Text-Video Retrieval
      Bo Fang*, Wenhao Wu*, Chang Liu*, Yu Zhou, Yuxin Song, Weiping Wang, Xiangbo Shu, Xiangyang Ji, Jingdong Wang
      ICCV 2023   [ PDF ] [ Code ]
    6. Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
      Wenhao Wu, Haipeng Luo, Bo Fang, Jingdong Wang, Wanli Ouyang
      CVPR 2023    Highlight, 2.5% acceptance rate    [ PDF ] [ Code ]
    7. Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
      Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang
      CVPR 2023    [ PDF ] [ Code ]
    8. Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
      Wenhao Wu, Zhun Sun, Wanli Ouyang
      AAAI 2023    [ PDF ] [ Code ] [ Poster ] [ Slides ] [ Video ]
    9. AdaCM: Adaptive ColorMLP for Real-Time Universal Photo-realistic Style Transfer
      Tianwei Lin, Honglin Lin, Fu Li, Dongliang He, Wenhao Wu, Meiling Wang, Xin Li, Yong Liu
      AAAI 2023    [ PDF ]
    10. Effective Invertible Arbitrary Image Rescaling
      Zhihong Pan, Baopu Li, Dongliang He, Wenhao Wu, Errui Ding
      WACV 2023    [ PDF ] [ Code ]
    11. NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition
      Boyang Xia*, Wenhao Wu**, Haoran Wang, Rui Su, Dongliang He, Haosen Yang, Xiaoran Fan, Wanli Ouyang
      ECCV 2022    [ PDF ] [ Project ]
    12. Temporal Saliency Query Network for Efficient Video Recognition
      Boyang Xia*, Zhihao Wang*, Wenhao Wu*, Haoran Wang, Jungong Han
      ECCV 2022    [ PDF ] [ Project ]
    13. CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval
      Haoran Wang, Dongliang He, Wenhao Wu, Boyang Xia, Min Yang, Fu Li, Yunlong Yu, Zhong Ji, Errui Ding, Jingdong Wang
      ECCV 2022    [ PDF ] [ Code ]
    14. MaMiCo: Macro-to-Micro Semantic Correspondence for Self-supervised Video Representation Learning
      Bo Fang*, Wenhao Wu*, Chang Liu*, Yu Zhou, Dongliang He, Weiping Wang
      ACMMM 2022   Oral, 5.0% acceptance rate   [ PDF ] [ Code ]
    15. Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation
      Yanwu Xu, Shaoan Xie, Wenhao Wu, Kun Zhang, Mingming Gong, Kayhan Batmanghelich
      CVPR 2022    [ PDF ] [ Code ]
    16. Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence
      Zhihong Pan, Baopu Li, Dongliang He, Mingde Yao, Wenhao Wu, Tianwei Lin, Xin Li, Errui Ding
      CVPR 2022    [ PDF ] [ Code ]
    17. Temporal Action Proposal Generation with Background Constraint
      Haosen Yang*, Wenhao Wu*, Lining Wang, Sheng Jin, Boyang Xia, Hongxun Yao, Hujie Huang
      AAAI 2022    [ PDF ] [ Code ]
    18. ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency
      Deng Huang*, Wenhao Wu*, Weiwen Hu, Xu Liu, Dongliang He, Zhihua Wu, Xiangmiao Wu, Mingkui Tan, Errui Ding
      ICCV 2021   [ PDF ] [ Poster ] [ Slides ] [ Video ] [ Code ]
    19. DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning
      Wenhao Wu, Yuxiang Zhao, Yanwu Xu, Xiao Tan, Dongliang He, Zhikang Zou, Jin Ye, Yingying Li, Mingde Yao, Zichao Dong, Yifeng Shi
      ACMMM 2021   [ PDF ] [ Poster ] [ Slides ] [ Code]
    20. Coarse to Fine: Domain Adaptive Crowd Counting via Adversarial Scoring Network
      Zhikang Zou, Xiaoye Qu, Pan Zhou, Shuangjie Xu, Xiaoqing Ye, Wenhao Wu, Jin Ye
      ACMMM 2021   [ PDF ]
    21. Weakly-Supervised Spatio-Temporal Anomaly Detection in Surveillance Video
      Jie Wu, Wei Zhang, Guanbin Li, Wenhao Wu, Xiao Tan, Yingying Li, Errui Ding, Liang Lin
      IJCAI 2021   [ PDF ]
    22. Good Practices and A Strong Baseline for Traffic Anomaly Detection
      Yuxiang Zhao*, Wenhao Wu*, Yue He, Yingying Li, Xiao Tan, Shifeng Chen
      CVPR 2021   Workshop on AICity Challenge   Winner [ PDF ]
    23. MVFNet: Multi-View Fusion Network for Efficient Video Recognition
      Wenhao Wu, Dongliang He, Tianwei Lin, Fu Li, Chuang Gan, Errui Ding
      AAAI 2021   [ PDF ] [ Poster ] [ Slides ] [ Code ]
    24. Attention-Driven Dynamic Graph Convolutional Network for Multi-Label Image Recognition
      Jin Ye, Junjun He, Xiaojiang Peng, Wenhao Wu, Yu Qiao
      ECCV 2020   [ PDF ] [ Code ]
    25. Dynamic Inference: A New Approach Toward Efficient Video Action Recognition
      Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Yi Yang, Shilei Wen
      CVPR 2020   Workshop on Efficient Deep Learning in Computer Vision  Oral   [ PDF ] [ Slides ]
    26. Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition
      Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Shilei Wen
      ICCV 2019  Oral, 4.3% acceptance rate   [ PDF ] [ Poster ] [ Slides ]