Malib: A parallel framework for population-based multi-agent reinforcement learning M Zhou, Z Wan, H Wang, M Wen, R Wu, Y Wen, Y Yang, Y Yu, J Wang, ... Journal of Machine Learning Research 24 (150), 1-12, 2023 | 36 | 2023 |
Offline constrained multi-objective reinforcement learning via pessimistic dual value iteration R Wu, Y Zhang, Z Yang, Z Wang Advances in Neural Information Processing Systems 34, 25439-25451, 2021 | 13 | 2021 |
Distributional offline policy evaluation with predictive error guarantees R Wu, M Uehara, W Sun International Conference on Machine Learning, 37685-37712, 2023 | 10 | 2023 |
Making rl with preference-based feedback efficient via randomization R Wu, W Sun arXiv preprint arXiv:2310.14554, 2023 | 7 | 2023 |
Contextual Bandits and Imitation Learning with Preference-Based Active Queries A Sekhari, K Sridharan, W Sun, R Wu Advances in Neural Information Processing Systems 36, 2024 | 6 | 2024 |
The benefits of being distributional: Small-loss bounds for reinforcement learning K Wang, K Zhou, R Wu, N Kallus, W Sun Advances in Neural Information Processing Systems 36, 2023 | 6 | 2023 |
Selective sampling and imitation learning via online regression A Sekhari, K Sridharan, W Sun, R Wu Advances in Neural Information Processing Systems 36, 2024 | 4 | 2024 |