Robustness may be at odds with accuracy D Tsipras, S Santurkar, L Engstrom, A Turner, A Madry arXiv preprint arXiv:1805.12152, 2018 | 1582 | 2018 |
Label-consistent backdoor attacks A Turner, D Tsipras, A Madry arXiv preprint arXiv:1912.02771, 2019 | 387* | 2019 |
There is no free lunch in adversarial robustness (but there are unexpected benefits) D Tsipras, S Santurkar, L Engstrom, A Turner, A Madry arXiv preprint arXiv:1805.12152 2 (3), 2018 | 88 | 2018 |
Optimal policies tend to seek power AM Turner, L Smith, R Shah, A Critch, P Tadepalli arXiv preprint arXiv:1912.01683, 2019 | 36 | 2019 |
Robustness may be at odds with accuracy. arXiv D Tsipras, S Santurkar, L Engstrom, A Turner, A Madry Machine Learning 6, 2019 | 21 | 2019 |
Parametrically Retargetable Decision-Makers Tend To Seek Power A Turner, P Tadepalli Advances in Neural Information Processing Systems 35, 31391-31401, 2022 | 8 | 2022 |
On Avoiding Power-Seeking by Artificial Intelligence AM Turner arXiv preprint arXiv:2206.11831, 2022 | 2 | 2022 |
Understanding and Controlling a Maze-Solving Policy Network U Mini, P Grietzer, M Sharma, A Meek, M MacDiarmid, AM Turner arXiv preprint arXiv:2310.08043, 2023 | | 2023 |
Formalizing the Problem of Side Effect Regularization AM Turner, A Saxena, P Tadepalli NeurIPS ML Safety Workshop, 2022 | | 2022 |
Formalizing the Problem of Side-Effect Avoidance AM Turner, A Saxena, P Tadepalli arXiv preprint arXiv:2206.11812, 2022 | | 2022 |