Follow
Xuehai Pan
Title
Cited by
Cited by
Year
Baichuan 2: Open large-scale language models
A Yang, B Xiao, B Wang, B Zhang, C Bian, C Yin, C Lv, D Pan, D Wang, ...
arXiv preprint arXiv:2309.10305, 2023
270*2023
Beavertails: Towards improved safety alignment of llm via a human-preference dataset
J Ji, M Liu, J Dai, X Pan, C Zhang, C Bian, B Chen, R Sun, Y Wang, ...
Advances in Neural Information Processing Systems 36, 2024
1332024
Ai alignment: A comprehensive survey
J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang, Y Duan, Z He, J Zhou, ...
arXiv preprint arXiv:2310.19852, 2023
992023
Safe rlhf: Safe reinforcement learning from human feedback
J Dai, X Pan, R Sun, J Ji, X Xu, M Liu, Y Wang, Y Yang
arXiv preprint arXiv:2310.12773, 2023
872023
Safety gymnasium: A unified safe reinforcement learning benchmark
J Ji, B Zhang, J Zhou, X Pan, W Huang, R Sun, Y Geng, Y Zhong, J Dai, ...
Advances in Neural Information Processing Systems 36, 2023
232023
Omnisafe: An infrastructure for accelerating safe reinforcement learning research
J Ji, J Zhou, B Zhang, J Dai, X Pan, R Sun, W Huang, Y Geng, M Liu, ...
arXiv preprint arXiv:2305.09304, 2023
222023
Mate: Benchmarking multi-agent reinforcement learning in distributed target coverage control
X Pan, M Liu, F Zhong, Y Yang, SC Zhu, Y Wang
Advances in Neural Information Processing Systems 35, 27862-27879, 2022
222022
Aligner: Achieving efficient alignment through weak-to-strong correction
J Ji, B Chen, H Lou, D Hong, B Zhang, X Pan, J Dai, Y Yang
arXiv preprint arXiv:2402.02416, 2024
212024
Safety-gymnasium
J Ji, B Zhang, X Pan, J Zhou, J Dai, Y Yang
GitHub repository, 2023
152023
Pku-beaver: Constrained value-aligned llm via safe rlhf
J Dai, X Pan, J Ji, R Sun, Y Wang, Y Yang
122023
Proactive multi-camera collaboration for 3d human pose estimation
H Ci, M Liu, X Pan, F Zhong, Y Wang
arXiv preprint arXiv:2303.03767, 2023
92023
Red teaming game: A game-theoretic framework for red teaming language models
C Ma, Z Yang, M Gao, H Ci, J Gao, X Pan, Y Yang
arXiv preprint arXiv:2310.00322, 2023
62023
Torchopt: An efficient library for differentiable optimization
J Ren, X Feng, B Liu, X Pan, Y Fu, L Mai, Y Yang
Journal of Machine Learning Research 24 (367), 1-14, 2023
62023
Rethinking information structures in rlhf: Reward generalization from a graph theory perspective
T Qiu, F Zeng, J Ji, D Yan, K Wang, J Zhou, H Yang, J Dai, X Pan, Y Yang
arXiv preprint arXiv:2402.10184, 2024
42024
The system can't perform the operation now. Try again later.
Articles 1–14