Multi-agent reinforcement learning: A selective overview of theories and algorithms K Zhang, Z Yang, T Başar Handbook of reinforcement learning and control, 321-384, 2021 | 837 | 2021 |
Fully decentralized multi-agent reinforcement learning with networked agents K Zhang, Z Yang, H Liu, T Zhang, T Basar International Conference on Machine Learning, 5872-5881, 2018 | 496 | 2018 |
Provably efficient reinforcement learning with linear function approximation C Jin, Z Yang, Z Wang, MI Jordan Conference on Learning Theory, 2137-2143, 2020 | 495 | 2020 |
A Theoretical Analysis of Deep Q-Learning. arXiv 2019 J Fan, Z Wang, Y Xie, Z Yang arXiv preprint arXiv:1901.00137, 1901 | 475* | 1901 |
Provably efficient exploration in policy optimization Q Cai, Z Yang, C Jin, Z Wang International Conference on Machine Learning, 1283-1294, 2020 | 213 | 2020 |
Is pessimism provably efficient for offline rl? Y Jin, Z Yang, Z Wang International Conference on Machine Learning, 5084-5096, 2021 | 198 | 2021 |
Neural policy gradient methods: Global optimality and rates of convergence L Wang, Q Cai, Z Yang, Z Wang arXiv preprint arXiv:1909.01150, 2019 | 187 | 2019 |
Multi-agent reinforcement learning via double averaging primal-dual optimization HT Wai, Z Yang, Z Wang, M Hong Advances in Neural Information Processing Systems 31, 2018 | 167 | 2018 |
A two-timescale framework for bilevel optimization: Complexity analysis and application to actor-critic M Hong, HT Wai, Z Wang, Z Yang arXiv preprint arXiv:2007.05170, 2020 | 144 | 2020 |
Provably global convergence of actor-critic: A case for linear quadratic regulator with ergodic cost Z Yang, Y Chen, M Hong, Z Wang Advances in neural information processing systems 32, 2019 | 103 | 2019 |
Policy optimization provably converges to Nash equilibria in zero-sum linear quadratic games K Zhang, Z Yang, T Basar Advances in Neural Information Processing Systems 32, 2019 | 102 | 2019 |
Provably efficient safe exploration via primal-dual policy optimization D Ding, X Wei, Z Yang, Z Wang, M Jovanovic International Conference on Artificial Intelligence and Statistics, 3304-3312, 2021 | 101 | 2021 |
Neural proximal/trust region policy optimization attains globally optimal policy B Liu, Q Cai, Z Yang, Z Wang arXiv preprint arXiv:1906.10306, 2019 | 100 | 2019 |
Learning zero-sum simultaneous-move markov games using function approximation and correlated equilibrium Q Xie, Y Chen, Z Wang, Z Yang Conference on learning theory, 3674-3682, 2020 | 96 | 2020 |
Neural temporal-difference learning converges to global optima Q Cai, Z Yang, JD Lee, Z Wang Advances in Neural Information Processing Systems 32, 2019 | 89 | 2019 |
Networked multi-agent reinforcement learning in continuous spaces K Zhang, Z Yang, T Basar 2018 IEEE conference on decision and control (CDC), 2771-2776, 2018 | 87 | 2018 |
Convergent policy optimization for safe reinforcement learning M Yu, Z Yang, M Kolar, Z Wang Advances in Neural Information Processing Systems 32, 2019 | 80 | 2019 |
Sparse nonlinear regression: Parameter estimation and asymptotic inference Z Yang, Z Wang, H Liu, YC Eldar, T Zhang arXiv preprint arXiv:1511.04514, 2015 | 75* | 2015 |
A near-optimal algorithm for stochastic bilevel optimization via double-momentum P Khanduri, S Zeng, M Hong, HT Wai, Z Wang, Z Yang Advances in neural information processing systems 34, 30271-30283, 2021 | 70 | 2021 |
Neural trust region/proximal policy optimization attains globally optimal policy B Liu, Q Cai, Z Yang, Z Wang Advances in neural information processing systems 32, 2019 | 69 | 2019 |