Follow
Mu Cai
Title
Cited by
Cited by
Year
VOS: Learning What You Don't Know by Virtual Outlier Synthesis
X Du, Z Wang, M Cai, Y Li
ICLR 2022, 2022
3072022
Masked Discrimination for Self-Supervised Learning on Point Clouds
H Liu, M Cai, YJ Lee
ECCV 2022, 2022
1482022
Investigating the catastrophic forgetting in multimodal large language models
Y Zhai, S Tong, X Li, M Cai, Q Qu, YJ Lee, Y Ma
Conference on Parsimony and Learning (CPAL) 2023, 2023
105*2023
Frequency domain image translation: More photo-realistic, better identity-preserving
M Cai, H Zhang, H Huang, Q Geng, Y Li, G Huang
ICCV 2021, 2021
862021
ViP-LLaVA: Making large multimodal models understand arbitrary visual prompts
M Cai, H Liu, SK Mustikovela, GP Meyer, Y Chai, D Park, YJ Lee
CVPR 2024, 2024
57*2024
Out-of-distribution Detection via Frequency-regularized Generative Models
M Cai, Y Li
WACV (Spotlight), 2023, 2023
412023
Llava-prumerge: Adaptive token reduction for efficient large multimodal models
Y Shang*, M Cai*, B Xu, YJ Lee, Y Yan
arXiv preprint arXiv:2403.15388, 2024
352024
A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance
Z Huang, A Zhou, Z Lin, M Cai, H Wang, YJ Lee
ICCV 2023, 2023
202023
A Game-Theoretic Strategy-Aware Interaction Algorithm with Validation on Real Traffic Data
L Sun*, M Cai*, W Zhan, M Tomizuka
IROS 2020, 2020
162020
Matryoshka Multimodal Models
M Cai, J Yang, J Gao, YJ Lee
NeurIPS 2024 Workshop on Video-Langauge Models, 2024
122024
Llara: Supercharging robot learning data for vision-language policy
X Li, C Mata, J Park, K Kahatapitiya, YS Jang, J Shang, K Ranasinghe, ...
CoRL 2024 Workshop on Language and Robot Learning, 2024
112024
An Investigation on LLMs’ Visual Understanding Ability Using SVG for Image-Text Bridging
M Cai, Z Huang, Y Li, H Wang, YJ Lee
WACV 2025, 2025
10*2025
CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples
J Zhang*, M Cai*, T Xie, YJ Lee
Findings of ACL 2024, 2024
52024
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
M Cai, R Tan, J Zhang, B Zou, K Zhang, F Yao, F Zhu, J Gu, Y Zhong, ...
NeurIPS 2024 Workshop on Video-Langauge Models, 2024
2*2024
Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner
Y Shang, B Xu, W Kang, M Cai, Y Li, Z Wen, Z Dong, K Keutzer, YJ Lee, ...
arXiv preprint arXiv:2409.12963, 2024
22024
VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
B Zou*, M Cai*, J Zhang, YJ Lee
EMNLP 2024, 2024
22024
Yo'LLaVA: Your Personalized Language and Vision Assistant
T Nguyen, H Liu, Y Li, M Cai, U Ojha, YJ Lee
NeurIPS 2024, 2024
12024
Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos
J Zhang*, M Cai*, YJ Lee
arXiv preprint arXiv:2410.02763, 2024
2024
Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
Y Li, H Liu, M Cai, Y Li, E Shechtman, Z Lin, YJ Lee, KK Singh
ECCV 2024, 2024
2024
Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds
M Cai, C Luo, YJ Lee, X Yang
IROS 2024, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–20