Short Bio
I am currently a researcher at at Shanghai AI Laboratory (shlab). I received my Ph.D. degree through a Joint PhD Program between Microsoft Research Asia (MSRA) and University of Science and Technology of China (USTC) in 2022. Prior to that, I received my Bachelor degree of Engineering at University of Science and Technology of China in 2017. I joined the Shanghai AI Laboratory in July 2022.
My research interest includes Multimodal Large Language Models and Image/Video Generation and Editing.
We are seeking long-term internship candidates and looking for research collaboration. Please send email to me if you want to join us.
🔥 News
- 2024.03: 🎉🎉 InternLM-XComposer Series has received 1,300+ github star. XComposer2 has been commercially utilized by ByteDance.
- 2024.02: 🎉🎉 The model and dataset of ShareGPT4V has been download 100,000+ times in one month.
- 2024.02: 🎉🎉 Three papers accepted by CVPR 2024. Alpha-CLIP is Strongly Accepted by All the Reviewers.
- 2024.01: 🎉🎉 We release InternLM-XComposer2. The first 7B model matches or even surpasses GPT-4V and Gemini Pro in certain assessments.
- 2023.09: 🎉🎉 We release InternLM-XComposer, a vision-language large model for advanced text-image comprehension and composition.
- 2023.09: 🎉🎉 One paper accepted by SIGGRAPH Asia 2023.
- 2023.07: 🎉🎉 V3Det, the first ten-thousand-class object detection dataset, is accepted by ICCV 2023 as an Oral paper.
- 2023.03: 🎉🎉 Two papers accepted by CVPR 2023.
- 2022.07: 🎉🎉 One paper accepted by ECCV 2022.
- 2022.03: 🎉🎉 One paper accepted by TPAMI.
- 2021.06: 🎉🎉 CoCosNet v2 is selected as a CVPR 2021 Best Paper Candidate .
- 2021.02: 🎉🎉 CoCosNet v2 and ProDA are accepted by CVPR 2021. CoCosNet v2 is an Oral Paper.
- 2020.10: 🎉🎉 Bring-Old-Photos-Back-to-Life has received 14,000+ github star.
- 2020.03: 🎉🎉 CoCosNet and Bring-Old-Photos-Back-to-Life are accepted by CVPR 2020 as Oral Papers.
📝 Selected Publications
Xiaoyi Dong*, Pan Zhang*, Yuhang Zang*, Yuhang Cao, Bin Wang, Linke Ouyang, Xilin Wei, Songyang Zhang, Haodong Duan, Maosong Cao, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang
The 7B model significantly outperforms existing multimodal models, matches or even surpasses GPT-4V and Gemini Pro in certain assessments.
ShareGPT4V: Improving large multi-modal models with better captions
Lin Chen*, Jinsong Li*, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, Dahua Lin
Pan Zhang*, Xiaoyi Dong*, Bin Wang, Yuhang Cao, Chao Xu, Linke Ouyang, Zhiyuan Zhao, Haodong Duan, Songyang Zhang, Shuangrui Ding, Wenwei Zhang, Hang Yan, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang
A vision-language large model that enables advanced image-text comprehension and composition
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang
CVPR 2024 Strongly Accepted by All the Reviewers| Project| Github
Qidong Huang, Xiaoyi Dong, Pan Zhang, Bin Wang, Conghui He, Jiaqi Wang, Dahua Lin, Weiming Zhang, Nenghai Yu
CVPR 2024 | Github
FreeDrag: Feature Dragging for Reliable Point-based Image Editing
Pengyang Ling*, Lin Chen*, Pan Zhang, Huaian Chen, Yi Jin, Jinjin Zheng
Vigc: Visual instruction generation and correction
Bin Wang, Fan Wu, Xiao Han, Jiahui Peng, Huaping Zhong, Pan Zhang, Xiaoyi Dong, Weijia Li, Wei Li, Jiaqi Wang, Conghui He
Hyperdreamer: Hyper-realistic 3d content generation and editing from a single image
Tong Wu*, Zhibing Li*, Shuai Yang*, Pan Zhang, Xinggang Pan, Jiaqi Wang, Dahua Lin, Ziwei Liu
V3Det: Vast Vocabulary Visual Detection Dataset
Jiaqi Wang*, Pan Zhang*, Tao Chu*, Yuhang Cao*, Yujie Zhou, Tong Wu, Bin Wang, Conghui He, Dahua Lin
Tao Chu, Pan Zhang, Qiong Liu, Jiaqi Wang
CVPR 2023 | Github
MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation
Bowen Zhang*, Chenyang Qi*, Pan Zhang, Bo Zhang, HsiangTao Wu, Dong Chen, Qifeng Chen, Yong Wang, Fang Wen
Real-time neural character rendering with pose-guided multiplane images
Hao Ouyang, Bo Zhang, Pan Zhang, Hao Yang, Jiaolong Yang, Dong Chen, Qifeng Chen, Fang Wen
ECCV 2022 | Project | Github | Video | Dynamic MVS Data
Pan Zhang, Bo Zhang, Ting Zhang, Dong Chen, Yong Wang, Fang Wen
CVPR 2021 | Github
CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation
Xingran Zhou, Bo Zhang, Ting Zhang, Pan Zhang, Jianmin Bao, Dong Chen, Zhongfei Zhang, Fang Wen
CVPR 2021 Oral, Best Paper Candidate | Github | Slides
Old Photo Restoration via Deep Latent Space Translation
Ziyu Wan, Bo Zhang, Dongdong Chen, Pan Zhang, Dong Chen, Jing Liao, Fang Wen
TPAMI | 🔥Github | Colab demo | Replicate Demo
Cross-domain Correspondence Learning for Exemplar-based Image Translation
Pan Zhang, Bo Zhang, Dong Chen, Lu Yuan, Fang Wen
CVPR 2020 Oral | Project | Github | Supplementary | Slides | Video
Bringing Old Photos Back to Life
Ziyu Wan, Bo Zhang, Dongdong Chen, Pan Zhang, Dong Chen, Jing Liao, Fang Wen
CVPR 2020 Oral | Project | 🔥Github | Supplementary | Colab demo | Replicate Demo
🎖 Honors and Awards
- 2022.05, Excellent award, Stars of Tomorrow Internship Program, Microsoft Research Asia (MSRA).
- 2017.06, Honor Ranking of Talent Program in Information Science and Technology (For top 5% students by USTC).
- 2015.06, National Scholarship (The highest scholarship awarded by the Ministry of Education, China).
- 2014.06, National Scholarship (The highest scholarship awarded by the Ministry of Education, China).
📖 Educations
- 2017.06 - 2022.06, Ph.D., University of Science and Technology of China and Microsoft Research Asia.
- 2013.09 - 2017.06, Undergraduate, University of Science and Technology of China.