Follow
Benjamin Thérien
Benjamin Thérien
Verified email at umontreal.ca - Homepage
Title
Cited by
Cited by
Year
Continual Pre-Training of Large Language Models: How to (re) warm your model?
K Gupta, B Thérien, A Ibrahim, ML Richter, Q Anthony, E Belilovsky, I Rish, ...
arXiv preprint arXiv:2308.04014, 2023
672023
Simple and scalable strategies to continually pre-train large language models
A Ibrahim, B Thérien, K Gupta, ML Richter, Q Anthony, T Lesort, ...
arXiv preprint arXiv:2403.08763, 2024
512024
Parametric scattering networks
S Gauthier, B Thérien, L Alsene-Racicot, M Chaudhary, I Rish, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022
242022
Out-of-distribution detection for lidar-based 3d object detection
C Huang, V Abdelzad, CG Mannes, L Rowe, B Therien, R Salay, ...
2022 IEEE 25th International Conference on Intelligent Transportation …, 2022
182022
Comparison of radiologists and deep learning for US grading of hepatic steatosis
P Vianna, SI Calce, P Boustros, C Larocque-Rigney, L Patry-Beaudoin, ...
Radiology 309 (1), e230659, 2023
92023
GPT-NeoX: Large Scale Autoregressive Language Modeling in Py-Torch, 9 2023
A Andonian, Q Anthony, S Biderman, S Black, P Gali, L Gao, E Hallahan, ...
URL https://www. github. com/eleutherai/gpt-neox, 0
7
CLaC-BP at SemEval-2021 Task 8: SciBERT Plus Rules for MeasEval
Benjamin Therien, Parsa Bagherzadeh, Sabine Bergler
Proceedings of the 15th International Workshop on Semantic Evaluation …, 2021
42021
Object Re-Identification from Point Clouds
B Thérien, C Huang, A Chow, K Czarnecki
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2024
22024
LO: Compute-Efficient Meta-Generalization of Learned Optimizers
B Thérien, CÉ Joseph, B Knyazev, E Oyallon, I Rish, E Belilovsky
arXiv preprint arXiv:2406.00153, 2024
12024
Learning Optimizers for Local SGD
CÉ Joseph, B Thérien, A Moudgil, B Knyazev, E Belilovsky
International Workshop on Federated Learning in the Age of Foundation Models …, 0
1
StructMoE: Structured Mixture of Experts Using Low Rank Experts
Z Sarwar, A Panda, B Thérien, S Rawls, A Das, K Balasubramaniam, ...
NeurIPS Efficient Natural Language and Speech Processing Workshop, 182-193, 2024
2024
Dense Backpropagation Improves Routing for Sparsely-Gated Mixture-of-Experts
A Panda, V Baherwani, Z Sarwar, B Thérien, S Rawls, S Sahu, ...
Workshop on Machine Learning and Compression, NeurIPS 2024, 2024
2024
Can We Learn Communication-Efficient Optimizers?
CÉ Joseph, B Thérien, A Moudgil, B Knyazev, E Belilovsky
arXiv preprint arXiv:2312.02204, 2023
2023
A Closer Look at Robustness to L-infinity and Spatial Perturbations and their Composition
L Rowe, B Thérien, K Czarnecki, H Zhang
arXiv preprint arXiv:2210.02577, 2022
2022
StructMoE: Augmenting MoEs with Hierarchically Routed Low Rank Experts
Z Sarwar, A Panda, B Thérien, S Rawls, S Sahu, S Chakraborty
The system can't perform the operation now. Try again later.
Articles 1–15