Finite-sample analysis for sarsa with linear function approximation S Zou, T Xu, Y Liang Advances in neural information processing systems 32, 2019 | 117 | 2019 |
Two time-scale off-policy TD learning: Non-asymptotic analysis over Markovian samples T Xu, S Zou, Y Liang Advances in Neural Information Processing Systems 32, 2019 | 64 | 2019 |
Improving sample complexity bounds for actor-critic algorithms T Xu, Z Wang, Y Liang arXiv preprint arXiv:2004.12956, 2020 | 55* | 2020 |
Non-asymptotic convergence analysis of two time-scale (natural) actor-critic algorithms T Xu, Z Wang, Y Liang arXiv preprint arXiv:2005.03557, 2020 | 37 | 2020 |
A primal approach to constrained policy optimization: Global optimality and finite-time analysis T Xu, Y Liang, G Lan | 34* | 2020 |
Algorithms for the estimation of transient surface heat flux during ultra-fast surface cooling ZF Zhou, TY Xu, B Chen International Journal of Heat and Mass Transfer 100, 1-10, 2016 | 34 | 2016 |
Reanalysis of variance reduced temporal difference learning T Xu, Z Wang, Y Zhou, Y Liang arXiv preprint arXiv:2001.01898, 2020 | 30 | 2020 |
Non-asymptotic convergence of Adam-type reinforcement learning algorithms under markovian sampling H Xiong, T Xu, Y Liang, W Zhang Proceedings of the AAAI Conference on Artificial Intelligence 35 (12), 10460 …, 2021 | 22 | 2021 |
Enhanced first and zeroth order variance reduced algorithms for min-max optimization T Xu, Z Wang, Y Liang, HV Poor | 19* | 2020 |
When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models? T Xu, Y Zhou, K Ji, Y Liang arXiv preprint arXiv:1806.04339, 2018 | 17* | 2018 |
Sample complexity bounds for two timescale value-based reinforcement learning algorithms T Xu, Y Liang International Conference on Artificial Intelligence and Statistics, 811-819, 2021 | 13 | 2021 |
Doubly robust off-policy actor-critic: Convergence and optimality T Xu, Z Yang, Z Wang, Y Liang International Conference on Machine Learning, 11581-11591, 2021 | 12 | 2021 |
Proximal Gradient Descent-Ascent: Variable Convergence under K {\L} Geometry Z Chen, Y Zhou, T Xu, Y Liang arXiv preprint arXiv:2102.04653, 2021 | 8 | 2021 |
Faster algorithm and sharper analysis for constrained markov decision process T Li, Z Guan, S Zou, T Xu, Y Liang, G Lan arXiv preprint arXiv:2110.10351, 2021 | 5 | 2021 |
When will generative adversarial imitation learning algorithms attain global convergence Z Guan, T Xu, Y Liang International Conference on Artificial Intelligence and Statistics, 1117-1125, 2021 | 5 | 2021 |
A unified off-policy evaluation approach for general value function T Xu, Z Yang, Z Wang, Y Liang arXiv preprint arXiv:2107.02711, 2021 | 1 | 2021 |
Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward T Xu, Y Liang arXiv preprint arXiv:2206.06426, 2022 | | 2022 |
Deterministic Policy Gradient: Convergence Analysis H Xiong, T Xu, L Zhao, Y Liang, W Zhang The 38th Conference on Uncertainty in Artificial Intelligence, 2022 | | 2022 |
Model-Based Offline Meta-Reinforcement Learning with Regularization S Lin, J Wan, T Xu, Y Liang, J Zhang arXiv preprint arXiv:2202.02929, 2022 | | 2022 |
PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method Z Guan, T Xu, Y Liang arXiv preprint arXiv:2110.06906, 2021 | | 2021 |