Sledovat
Xiaoqian Shen
Xiaoqian Shen
CS PhD @ KAUST
E-mailová adresa ověřena na: kaust.edu.sa - Domovská stránka
Název
Citace
Citace
Rok
Minigpt-4: Enhancing vision-language understanding with advanced large language models
D Zhu, J Chen, X Shen, X Li, M Elhoseiny
ICLR'24, 2024
21832024
Minigpt-v2: large language model as a unified interface for vision-language multi-task learning
J Chen, D Zhu, X Shen, X Li, Z Liu, P Zhang, R Krishnamoorthi, ...
arXiv preprint arXiv:2310.09478, 2023
4432023
Chatgpt asks, blip-2 answers: Automatic questioning towards enriched visual descriptions
D Zhu, J Chen, K Haydarov, X Shen, W Zhang, M Elhoseiny
TMLR, 2024
912024
Hrs-bench: Holistic, reliable and scalable benchmark for text-to-image models
EM Bakr, X Shen, P Sun, FF Khan, LE Li, M Elhoseiny
ICCV'23, 2023
552023
Multi-ConDoS: Multimodal contrastive domain sharing generative adversarial networks for self-supervised medical image segmentation
J Zhang, S Zhang, X Shen, T Lukasiewicz, Z Xu
IEEE Transactions on Medical Imaging, 2023
312023
Mostgan-v: Video generation with temporal motion styles
X Shen, X Li, M Elhoseiny
CVPR'23, 2023
312023
Minigpt4-video: Advancing multimodal llms for video understanding with interleaved visual-textual tokens
K Ataallah, X Shen, E Abdelrahman, E Sleiman, D Zhu, J Ding, ...
CVPRW'24, 2024
302024
KeMRE: Knowledge-enhanced medical relation extraction for Chinese medicine instructions
T Qi, S Qiu, X Shen, H Chen, S Yang, H Wen, Y Zhang, Y Wu, Y Huang
Journal of Biomedical Informatics 120, 103834, 2021
202021
Exploring hierarchical graph representation for large-scale zero-shot image classification
K Yi, X Shen, Y Gou, M Elhoseiny
ECCV'22, 2022
192022
StoryGPT-V: Large Language Models as Consistent Story Visualizers
X Shen, M Elhoseiny
arXiv preprint arXiv:2312.02252, 2023
82023
Longvu: Spatiotemporal adaptive compression for long video-language understanding
X Shen, Y Xiong, C Zhao, L Wu, J Chen, C Zhu, Z Liu, F Xiao, ...
arXiv preprint arXiv:2410.17434, 2024
42024
Affective visual dialog: A large-scale benchmark for emotional reasoning based on visually grounded conversations
K Haydarov, X Shen, A Madasu, M Salem, J Li, G Elsayed, M Elhoseiny
ECCV'24, 2024
32024
Adversarial Text to Continuous Image Generation
K Haydarov, A Muhamed, X Shen, J Lazarevic, I Skorokhodov, ...
CVPR'24, 2024
3*2024
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
K Ataallah, X Shen, E Abdelrahman, E Sleiman, M Zhuge, J Ding, D Zhu, ...
ECCV’24, 2024
22024
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling
Z Ye, J Liu, R Peng, J Cao, Z Chen, Y Zhang, Z Xuan, M Zhou, X Shen, ...
arXiv preprint arXiv:2408.03695, 2024
2024
iMotion-LLM: Motion Prediction Instruction Tuning
A Felemban, EM Bakr, X Shen, J Ding, A Mohamed, M Elhoseiny
arXiv preprint arXiv:2406.06211, 2024
2024
EmoTalker: Audio Driven Emotion Aware Talking Head Generation
X Shen, FF Khan, M Elhoseiny
Proceedings of the Asian Conference on Computer Vision, 1900-1917, 2024
2024
ReferPix2Pix: Guiding Multi-Modal LLMs for Image Editing with Referential Pixel Grounding
X Shen, M Elhoseiny
Systém momentálně nemůže danou operaci provést. Zkuste to znovu později.
Články 1–18