Short Bio

I am currently a researcher at at Shanghai AI Laboratory (shlab). I received my Ph.D. degree through a Joint PhD Program between Microsoft Research Asia (MSRA) and University of Science and Technology of China (USTC) in 2022. Prior to that, I received my Bachelor degree of Engineering at University of Science and Technology of China in 2017. I joined the Shanghai AI Laboratory in July 2022.

My research interest includes Multimodal Large Language Models and Image/Video Generation and Editing.

We are seeking long-term internship candidates and looking for research collaboration. Please send email to me if you want to join us.

🔥 News

  • 2024.03:  🎉🎉 InternLM-XComposer Series has received 1,300+ github star. XComposer2 has been commercially utilized by ByteDance.
  • 2024.02:  🎉🎉 The model and dataset of ShareGPT4V has been download 100,000+ times in one month.
  • 2024.02:  🎉🎉 Three papers accepted by CVPR 2024. Alpha-CLIP is Strongly Accepted by All the Reviewers.
  • 2024.01:  🎉🎉 We release InternLM-XComposer2. The first 7B model matches or even surpasses GPT-4V and Gemini Pro in certain assessments.
  • 2023.09:  🎉🎉 We release InternLM-XComposer, a vision-language large model for advanced text-image comprehension and composition.
  • 2023.09:  🎉🎉 One paper accepted by SIGGRAPH Asia 2023.
  • 2023.07:  🎉🎉 V3Det, the first ten-thousand-class object detection dataset, is accepted by ICCV 2023 as an Oral paper.
  • 2023.03:  🎉🎉 Two papers accepted by CVPR 2023.
  • 2022.07:  🎉🎉 One paper accepted by ECCV 2022.
  • 2022.03:  🎉🎉 One paper accepted by TPAMI.
  • 2021.06:  🎉🎉 CoCosNet v2 is selected as a CVPR 2021 Best Paper Candidate .
  • 2021.02:  🎉🎉 CoCosNet v2 and ProDA are accepted by CVPR 2021. CoCosNet v2 is an Oral Paper.
  • 2020.10:  🎉🎉 Bring-Old-Photos-Back-to-Life has received 14,000+ github star.
  • 2020.03:  🎉🎉 CoCosNet and Bring-Old-Photos-Back-to-Life are accepted by CVPR 2020 as Oral Papers.

📝 Selected Publications

sym

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

Xiaoyi Dong*, Pan Zhang*, Yuhang Zang*, Yuhang Cao, Bin Wang, Linke Ouyang, Xilin Wei, Songyang Zhang, Haodong Duan, Maosong Cao, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang

The 7B model significantly outperforms existing multimodal models, matches or even surpasses GPT-4V and Gemini Pro in certain assessments.

Models | Github

sym

ShareGPT4V: Improving large multi-modal models with better captions

Lin Chen*, Jinsong Li*, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, Dahua Lin

Project | Dataset | Github

sym

Internlm-XComposer: A vision-language large model for advanced text-image comprehension and composition

Pan Zhang*, Xiaoyi Dong*, Bin Wang, Yuhang Cao, Chao Xu, Linke Ouyang, Zhiyuan Zhao, Haodong Duan, Songyang Zhang, Shuangrui Ding, Wenwei Zhang, Hang Yan, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang

A vision-language large model that enables advanced image-text comprehension and composition

Models | Github

sym

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

CVPR 2024 Strongly Accepted by All the Reviewers| Project| Github

sym

OPERA: Alleviating hallucination in multi-modal large language models via over-trust penalty and retrospection-allocation

Qidong Huang, Xiaoyi Dong, Pan Zhang, Bin Wang, Conghui He, Jiaqi Wang, Dahua Lin, Weiming Zhang, Nenghai Yu

CVPR 2024 | Github

sym

FreeDrag: Feature Dragging for Reliable Point-based Image Editing

Pengyang Ling*, Lin Chen*, Pan Zhang, Huaian Chen, Yi Jin, Jinjin Zheng

CVPR 2024 | Project| Github

sym

Vigc: Visual instruction generation and correction

Bin Wang, Fan Wu, Xiao Han, Jiahui Peng, Huaping Zhong, Pan Zhang, Xiaoyi Dong, Weijia Li, Wei Li, Jiaqi Wang, Conghui He

AAAI 2024 | Project| Dataset) | Github

sym

Hyperdreamer: Hyper-realistic 3d content generation and editing from a single image

Tong Wu*, Zhibing Li*, Shuai Yang*, Pan Zhang, Xinggang Pan, Jiaqi Wang, Dahua Lin, Ziwei Liu

SIGGRAPH Asia 2023 | Project | Github

sym

V3Det: Vast Vocabulary Visual Detection Dataset

Jiaqi Wang*, Pan Zhang*, Tao Chu*, Yuhang Cao*, Yujie Zhou, Tong Wu, Bin Wang, Conghui He, Dahua Lin

ICCV 2023 Oral | Dataset| Github

sym

MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation

Bowen Zhang*, Chenyang Qi*, Pan Zhang, Bo Zhang, HsiangTao Wu, Dong Chen, Qifeng Chen, Yong Wang, Fang Wen

CVPR 2023 | Project | Github

sym

Real-time neural character rendering with pose-guided multiplane images

Hao Ouyang, Bo Zhang, Pan Zhang, Hao Yang, Jiaolong Yang, Dong Chen, Qifeng Chen, Fang Wen

ECCV 2022 | Project | Github | Video | Dynamic MVS Data

sym

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation

Pan Zhang, Bo Zhang, Ting Zhang, Dong Chen, Yong Wang, Fang Wen

CVPR 2021 | Github

sym

CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation

Xingran Zhou, Bo Zhang, Ting Zhang, Pan Zhang, Jianmin Bao, Dong Chen, Zhongfei Zhang, Fang Wen

CVPR 2021 Oral, Best Paper Candidate | Github | Slides

sym

Old Photo Restoration via Deep Latent Space Translation

Ziyu Wan, Bo Zhang, Dongdong Chen, Pan Zhang, Dong Chen, Jing Liao, Fang Wen

TPAMI | 🔥Github | Colab demo | Replicate Demo

sym

Cross-domain Correspondence Learning for Exemplar-based Image Translation

Pan Zhang, Bo Zhang, Dong Chen, Lu Yuan, Fang Wen

CVPR 2020 Oral | Project | Github | Supplementary | Slides | Video

sym

Bringing Old Photos Back to Life

Ziyu Wan, Bo Zhang, Dongdong Chen, Pan Zhang, Dong Chen, Jing Liao, Fang Wen

CVPR 2020 Oral | Project | 🔥Github | Supplementary | Colab demo | Replicate Demo

🎖 Honors and Awards

  • 2022.05, Excellent award, Stars of Tomorrow Internship Program, Microsoft Research Asia (MSRA).
  • 2017.06, Honor Ranking of Talent Program in Information Science and Technology (For top 5% students by USTC).
  • 2015.06, National Scholarship (The highest scholarship awarded by the Ministry of Education, China).
  • 2014.06, National Scholarship (The highest scholarship awarded by the Ministry of Education, China).

📖 Educations

  • 2017.06 - 2022.06, Ph.D., University of Science and Technology of China and Microsoft Research Asia.
  • 2013.09 - 2017.06, Undergraduate, University of Science and Technology of China.