DiLoCo: Distributed Low-Communication Training of Language Models A Douillard, Q Feng, AA Rusu, R Chhaparia, Y Donchev, A Kuncoro, ... arXiv preprint arXiv:2311.08105, 2023 | 31 | 2023 |
Scaling instructable agents across many simulated worlds MA Raad, A Ahuja, C Barros, F Besse, A Bolt, A Bolton, B Brownfield, ... arXiv preprint arXiv:2404.10179, 2024 | 25 | 2024 |
DiPaCo: Distributed Path Composition A Douillard, Q Feng, AA Rusu, A Kuncoro, Y Donchev, R Chhaparia, ... arXiv preprint arXiv:2403.10616, 2024 | 6 | 2024 |
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch A Douillard, Y Donchev, K Rush, S Kale, Z Charles, Z Garrett, G Teston, ... arXiv preprint arXiv:2501.18512, 2025 | 2 | 2025 |