Deepfashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images Y Ge, R Zhang, X Wang, X Tang, P Luo Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019 | 464 | 2019 |
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension B Li, R Wang, G Wang, Y Ge, Y Ge, Y Shan arXiv preprint arXiv:2307.16125, 2023 | 288 | 2023 |
All in one: Exploring unified video-language pre-training J Wang, Y Ge, R Yan, Y Ge, KQ Lin, S Tsutsui, X Lin, G Cai, J Wu, Y Shan, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 200 | 2023 |
Parser-Free Virtual Try-on via Distilling Appearance Flows Y Ge, Y Song, R Zhang, C Ge, W Liu, P Luo Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021 | 195 | 2021 |
Bridging Video-Text Retrieval With Multiple Choice Questions Y Ge, Y Ge, X Liu, D Li, Y Shan, X Qie, P Luo Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 158 | 2022 |
Disentangled Cycle Consistency for Highly-realistic Virtual Try-On C Ge, Y Song, Y Ge, H Yang, W Liu, P Luo Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021 | 111 | 2021 |
Scan: Self-and-collaborative attention network for video person re-identification R Zhang, J Li, H Sun, Y Ge, P Luo, X Wang, L Lin IEEE Transactions on Image Processing 28 (10), 4870-4882, 2019 | 97 | 2019 |
Gnfactor: Multi-task real robot learning with generalizable neural feature fields Y Ze, G Yan, YH Wu, A Macaluso, Y Ge, J Ye, N Hansen, LE Li, X Wang Conference on Robot Learning, 284-301, 2023 | 61 | 2023 |
Planting a SEED of Vision in Large Language Model Y Ge, Y Ge, Z Zeng, X Wang, Y Shan arXiv preprint arXiv:2307.08041, 2023 | 60 | 2023 |
SEED-Bench-2: Benchmarking Multimodal Large Language Models B Li, Y Ge, Y Ge, G Wang, R Wang, R Zhang, Y Shan Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 57 | 2024 |
Making llama see and draw with seed tokenizer Y Ge, S Zhao, Z Zeng, Y Ge, C Li, X Wang, Y Shan International Conference on Learning Representations 2024, 2023 | 50 | 2023 |
Journeydb: A benchmark for generative image understanding K Sun, J Pan, Y Ge, H Li, H Duan, X Wu, R Zhang, A Zhou, Z Qin, Y Wang, ... Advances in Neural Information Processing Systems 36, 2024 | 47 | 2024 |
Miles: Visual bert pre-training with injected language semantics for video-text retrieval Y Ge, Y Ge, X Liu, J Wang, J Wu, Y Shan, X Qie, P Luo European Conference on Computer Vision, 691-708, 2022 | 44 | 2022 |
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation Y Ge, S Zhao, J Zhu, Y Ge, K Yi, L Song, C Li, X Ding, Y Shan arXiv preprint arXiv:2404.14396, 2024 | 27 | 2024 |
Retrieving-to-answer: Zero-shot video question answering with frozen large language models J Pan, Z Lin, Y Ge, X Zhu, R Zhang, Y Wang, Y Qiao, H Li Proceedings of the IEEE/CVF International Conference on Computer Vision, 272-283, 2023 | 20 | 2023 |
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation J Zhu, X Ding, Y Ge, Y Ge, S Zhao, H Zhao, X Wang, Y Shan arXiv preprint arXiv:2312.09251, 2023 | 19 | 2023 |
Vit-lens: Towards omni-modal representations W Lei, Y Ge, K Yi, J Zhang, D Gao, D Sun, Y Ge, Y Shan, MZ Shou Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 14* | 2024 |
EgoPlan-Bench: Benchmarking Egocentric Embodied Planning with Multimodal Large Language Models Y Chen, Y Ge, Y Ge, M Ding, B Li, R Wang, R Xu, Y Shan, X Liu arXiv preprint arXiv:2312.06722, 2023 | 14 | 2023 |
Policy Adaptation From Foundation Model Feedback Y Ge, A Macaluso, LE Li, P Luo, X Wang Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 10* | 2023 |
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension B Li, Y Ge, Y Chen, Y Ge, R Zhang, Y Shan arXiv preprint arXiv:2404.16790, 2024 | 9 | 2024 |