MiniGPT-4: Enhancing vision-language understanding with advanced large language models D Zhu, J Chen, X Shen, X Li, M Elhoseiny International Conference on Learning Representations 2024, 2023 | 1923 | 2023 |
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning J Chen, D Zhu, X Shen, X Li, Z Liu, P Zhang, R Krishnamoorthi, ... 2nd MMFM Workshop in CVPR2024, 2023 | 400 | 2023 |
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions D Zhu, J Chen, K Haydarov, X Shen, W Zhang, M Elhoseiny Transactions on Machine Learning Research (TMLR), 2023 | 91 | 2023 |
Social-Implicit: Rethinking Trajectory Prediction Evaluation and The Effectiveness of Implicit Maximum Likelihood Estimation A Mohamed, D Zhu, W Vu, M Elhoseiny, C Claudel European Conference on Computer Vision (ECCV) 2022, 2022 | 62 | 2022 |
Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only J Chen, D Zhu, G Qian, B Ghanem, Z Yan, C Zhu, F Xiao, SC Culatana, ... Proceedings of the IEEE/CVF International Conference on Computer Vision, 699-710, 2023 | 37* | 2023 |
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions J Chen, D Zhu, K Haydarov, X Li, M Elhoseiny arXiv preprint arXiv:2304.04227, 2023 | 35 | 2023 |
Minigpt4-video: Advancing multimodal llms for video understanding with interleaved visual-textual tokens K Ataallah, X Shen, E Abdelrahman, E Sleiman, D Zhu, J Ding, ... 2nd MMFM Workshop in CVPR2024, 2024 | 20 | 2024 |
RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition J Chen, A Agarwal, S Abdelkarim, D Zhu, M Elhoseiny Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 18* | 2022 |
Motion forecasting with unlikelihood training in continuous space D Zhu, M Zahran, LE Li, M Elhoseiny Conference on Robot Learning, 1003-1012, 2022 | 17 | 2022 |
Guiding Online Reinforcement Learning with Action-Free Offline Pretraining D Zhu, Y Wang, J Schmidhuber, M Elhoseiny arXiv preprint arXiv:2301.12876, 2023 | 9 | 2023 |
HalentNet: Multimodal Trajectory Forecasting with Hallucinative Intents D Zhu, M Zahran, LE Li, M Elhoseiny International Conference on Learning Representations, 2021, 2021 | 7 | 2021 |
MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis A Alkhaldi, R Alnajim, L Alabdullatef, R Alyahya, J Chen, D Zhu, A Alsinan, ... arXiv preprint arXiv:2407.04106, 2024 | 6 | 2024 |
Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning D Zhu, LE Li, M Elhoseiny International Conference on Learning Representations 2023, 2022 | 6 | 2022 |
Learning to disentangle latent physical factors for video prediction D Zhu, M Munderloh, B Rosenhahn, J Stückler Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Dortmund …, 2019 | 4 | 2019 |
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos K Ataallah, X Shen, E Abdelrahman, E Sleiman, M Zhuge, J Ding, D Zhu, ... European Conference on Computer Vision (ECCV) 2024, 2024 | 1 | 2024 |