Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives K Grauman, A Westbury, L Torresani, K Kitani, J Malik, T Afouras, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 154 | 2024 |
Univtg: Towards unified video-language temporal grounding KQ Lin, P Zhang, J Chen, S Pramanick, D Gao, AJ Wang, R Yan, MZ Shou Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 133 | 2023 |
Assistgpt: A general multi-modal assistant that can plan, execute, inspect, and learn D Gao, L Ji, L Zhou, KQ Lin, J Chen, Z Fan, MZ Shou arXiv preprint arXiv:2306.08640, 2023 | 77 | 2023 |
Foreground-background imbalance problem in deep object detectors: A review J Chen, Q Wu, D Liu, T Xu 2020 IEEE Conference on Multimedia Information Processing and Retrieval …, 2020 | 36 | 2020 |
Videollm-online: Online video large language model for streaming video J Chen, Z Lv, S Wu, KQ Lin, C Song, D Gao, JW Liu, Z Gao, D Mao, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 33 | 2024 |
Assistq: Affordance-centric question-driven task completion for egocentric assistant B Wong*, J Chen*, Y Wu*, SW Lei, D Mao, D Gao, MZ Shou Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel …, 2022 | 31 | 2022 |
Affordance grounding from demonstration video to target image J Chen, D Gao, KQ Lin, MZ Shou Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 30 | 2023 |
Linking the characters: Video-oriented social graph generation via hierarchical-cumulative GCN S Wu, J Chen, T Xu, L Chen, L Wu, Y Hu, E Chen Proceedings of the 29th ACM International Conference on Multimedia, 4716-4724, 2021 | 29 | 2021 |
Is heuristic sampling necessary in training deep object detectors? J Chen, D Liu, T Xu, S Wu, Y Cheng, E Chen IEEE Transactions on Image Processing 30, 8454-8467, 2021 | 14 | 2021 |
Residual objectness for imbalance reduction J Chen, D Liu, B Luo, X Peng, T Xu, E Chen Pattern Recognition 130, 108781, 2022 | 11 | 2022 |
One token to seg them all: Language instructed reasoning segmentation in videos Z Bai, T He, H Mei, P Wang, Z Gao, J Chen, Z Zhang, MZ Shou Advances in Neural Information Processing Systems 37, 6833-6859, 2024 | 10 | 2024 |
Is sampling heuristics necessary in training deep object detectors? J Chen, D Liu, T Xu, S Zhang, S Wu, B Luo arXiv preprint arXiv:1909.04868, 2019 | 10 | 2019 |
Overlap sampler for region-based object detection J Chen, B Luo, Q Wu, J Chen, X Peng Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2020 | 9 | 2020 |
GazeVQA: A video question answering dataset for multiview eye-gaze task-oriented collaborations M Ilaslan, C Song, J Chen, D Gao, W Lei, Q Xu, J Lim, M Shou Proceedings of the 2023 Conference on Empirical Methods in Natural Language …, 2023 | 8 | 2023 |
Videollm-mod: Efficient video-language streaming with mixture-of-depths vision computation S Wu, J Chen, KQ Lin, Q Wang, Y Gao, Q Xu, T Xu, Y Hu, E Chen, ... Advances in Neural Information Processing Systems 37, 109922-109947, 2024 | 7 | 2024 |
Dropit: Dropping intermediate tensors for memory-efficient dnn training J Chen, K Xu, Y Wang, Y Cheng, A Yao arXiv preprint arXiv:2202.13808, 2022 | 6 | 2022 |
Learning video context as interleaved multimodal sequences KQ Lin, P Zhang, D Gao, X Xia, J Chen, Z Gao, J Xie, X Xiao, MZ Shou European Conference on Computer Vision, 375-396, 2024 | 3 | 2024 |
Communication-efficient federated learning with stagewise training strategy Y Cheng, S Shen, X Liang, J Liu, J Chen, T Zhang, E Chen Neural Networks 167, 460-472, 2023 | 3 | 2023 |
Capturing Implicit Spatial Cues for Monocular 3d Hand Reconstruction Q Wu*, J Chen*, X Zhou, Z Yao, X Yang 2021 IEEE International Conference on Multimedia and Expo (ICME), 1-6, 2021 | 3 | 2021 |
From a social cognitive perspective: Context-aware visual social relationship recognition S Wu, C Zhang, J Chen, T Xu, L Wu, Y Hu, E Chen arXiv preprint arXiv:2406.08358, 2024 | 2 | 2024 |