About Me
I’m currently a Ph.D. student advised by Prof. Changsheng Xu, at Institute of Automation, Chinese Academy of Sciences (CASIA). My research interest mainly focus on multimodality and general intelligence.
I’m always open to possible cooperation or visiting opportunities. If you are interested, please contact me by email.
Education
- 2022.9 - Present : Ph.D. in Pattern Recognition and Intelligent System, Institute of Automation, Chinese Academy of Sciences (CASIA), advised by Prof. Changsheng Xu.
- 2019.9 - 2022.6 : M.S. in Computer Science, Institute of Automation, Chinese Academy of Sciences (CASIA), advised by Prof. Weiming Dong
- 2015.9 - 2019.6 : B.E. in Aerospace Engineering, Beijing Institute of Technology, advised by Prof. Zixuan Liang.
Experiences
- 2023.5 - Present : Research internship at Peng Cheng Laboratory
- 2022.5 - 2023.3 : Research internship at Tencent Youtu Lab, supported by Tencent Rhino-Bird Research Elite Program
- 2021.3 - 2021.9 : Research internship at Tencent Youtu Lab
Selected Publications
Conferences
Libra: Building Decoupled Vision System on Large Language Models
Yifan Xu, Xiaoshan Yang, Yaguang Song, Changsheng Xu
ICML2024: the Forty-first International Conference on Machine Learning
[arXiv] [Code]Multi-modal Queried Object Detection in the Wild
Yifan Xu, Mengdan Zhang, Chaoyou Fu, Peixian Chen, Xiaoshan Yang, Ke Li, Changsheng Xu
NeurIPS2023: the Thirty-seventh Annual Conference on Neural Information Processing Systems
[arXiv] [Code] [Poster] [Slides]Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
Yifan Xu, Zhijie Zhang, Mengdan Zhang, Kekai Sheng, et al.
AAAI2022: the 36th AAAI Conference on Artificial Intelligence
[arXiv] [Code] [Poster] [Slides] [Video]Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips
Man Yao, JiaKui Hu, Tianxiang Hu, Yifan Xu, Zhaokun Zhou, Yonghong Tian, Bo XU, Guoqi Li
ICLR2024: the Twelfth International Conference on Learning Representations
[arXiv] [Code]
Journals
Transformers in Computational Visual Media: A Survey
Yifan Xu, Huapeng Wei, Mingxuan Lin, Yingying Deng, Kekai Sheng, Mengdan Zhang, et al.
CVMJ: Computational Visual Media
[Paper]Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
Yifan Xu, Mengdan Zhang, Xiaoshan Yang, Changsheng Xu
TIP: IEEE Transactions on Image Processing
[arxiv]Towards Corruption-Agnostic Robust Domain Adaptation
Yifan Xu, Kekai Sheng, Weiming Dong, Baoyuan Wu, Changsheng Xu, Bao-Gang Hu
TOMM: ACM Transactions on Multimedia Computing, Communications, and Applications
[arxiv]
Awards & Honors
- 2024 National Scholarship of China
(博士研究生国家奖学金) - 2023 First Prize of the Pandeng Scholarship
(攀登一等奖学金) - 2022 Second Prize of the Pandeng Scholarship
(攀登二等奖学金) - 2023, 2023 Tencent Rhino-Bird Research Elite
(腾讯犀牛鸟精英人才) - 2022, 2023 Merit Student of University of Chinese Academy of Sciences
(中国科学院大学三好学生) - 2021 - 2024 Academic Scholarship of University of Chinese Academy of Sciences
(中国科学院大学学业奖学金) - 2017 Space Application Scholarship
(太空应用奖学金) - 2017 Second Prize in China Undergraduate Mathematical Contest in Modeling
Invited Talks
- 2023.12.10 Invited talk “Multi-modal Queried Object Detection in the Wild” at NeurIPS2023
- 2023.3.24 Invited talk “Some Thoughts on Vision-Language Models” at Multimedia Computing Lab, CASIA
- 2022.11.24 Invited talk “The Mathematical Theorem of Diffusion” at Tencent Youtu Lab
- 2022.9.15 Invited talk “Advances in Weakly-supervised Object Detection” at Tencent Youtu Lab
- 2022.2.24 Invited talk “Slow-Fast Token Evolution for Dynamic Vision Transformer” at AAAI2022
Domain Service
Conference Reviewer
- CVPR 2024: the IEEE/CVF Conference on Computer Vision and Pattern Recognition
- ECCV 2024: the 18th European Conference on Computer Vision
- AAAI 2024: the 38th Annual AAAI Conference on Artificial Intelligence
- AAAI 2023: the 37th Annual AAAI Conference on Artificial Intelligence
Journal Reviewer
- TPAMI: IEEE Transactions on Pattern Analysis and Machine Intelligence
- TIP: IEEE Transactions on Image Processing
- CVMJ: Computational Visual Media
Maintained Projects
- Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
- Multi-modal Queried Object Detection in the Wild
- Libra: Building Decoupled Vision System on Large Language Models
- Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips
Contact
- Email: