Provable defenses against adversarial examples via the convex outer adversarial polytope E Wong, JZ Kolter arXiv preprint arXiv:1711.00851, 2017 | 1773 | 2017 |
Fast is better than free: Revisiting adversarial training E Wong, L Rice, JZ Kolter arXiv preprint arXiv:2001.03994, 2020 | 1454 | 2020 |
Overfitting in adversarially robust deep learning L Rice, E Wong, Z Kolter International Conference on Machine Learning, 8093-8104, 2020 | 1026 | 2020 |
Jailbreaking black box large language models in twenty queries P Chao, A Robey, E Dobriban, H Hassani, GJ Pappas, E Wong arXiv preprint arXiv:2310.08419, 2023 | 545 | 2023 |
Scaling provable adversarial defenses E Wong, F Schmidt, JH Metzen, JZ Kolter Advances in Neural Information Processing Systems, 8400-8409, 2018 | 496 | 2018 |
Wasserstein adversarial examples via projected sinkhorn iterations E Wong, F Schmidt, Z Kolter International Conference on Machine Learning, 6808-6817, 2019 | 270 | 2019 |
Smoothllm: Defending large language models against jailbreaking attacks A Robey, E Wong, H Hassani, GJ Pappas arXiv preprint arXiv:2310.03684, 2023 | 267 | 2023 |
Faithful chain-of-thought reasoning Q Lyu, S Havaldar, A Stein, L Zhang, D Rao, E Wong, M Apidianaki, ... The 13th International Joint Conference on Natural Language Processing and …, 2023 | 241 | 2023 |
Adversarial robustness against the union of multiple perturbation models P Maini, E Wong, Z Kolter International Conference on Machine Learning, 6640-6650, 2020 | 193 | 2020 |
Jailbreakbench: An open robustness benchmark for jailbreaking large language models P Chao, E Debenedetti, A Robey, M Andriushchenko, F Croce, V Sehwag, ... Advances in Neural Information Processing Systems 37, 55005-55029, 2025 | 131 | 2025 |
Black box adversarial prompting for foundation models N Maus, P Chao, E Wong, J Gardner arXiv preprint arXiv:2302.04237, 2023 | 112* | 2023 |
Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation C Fan, J Liu, Y Zhang, E Wong, D Wei, S Liu arXiv preprint arXiv:2310.12508, 2023 | 109 | 2023 |
Leveraging sparse linear layers for debuggable deep networks E Wong, S Santurkar, A Madry International Conference on Machine Learning, 11205-11216, 2021 | 91 | 2021 |
Learning perturbation sets for robust machine learning E Wong, JZ Kolter arXiv preprint arXiv:2007.08450, 2020 | 86 | 2020 |
Certified patch robustness via smoothed vision transformers H Salman, S Jain, E Wong, A Madry Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 73 | 2022 |
In-context Example Selection with Influences T Nguyen, E Wong arXiv preprint arXiv:2302.11042, 2023 | 55 | 2023 |
A Data-Based Perspective on Transfer Learning S Jain, H Salman, A Khaddaj, E Wong, SM Park, A Mądry Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 48 | 2023 |
Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing J Ji, B Hou, A Robey, GJ Pappas, H Hassani, Y Zhang, E Wong, S Chang arXiv preprint arXiv:2402.16192, 2024 | 42 | 2024 |
When does Bias Transfer in Transfer Learning? H Salman, S Jain, A Ilyas, L Engstrom, E Wong, A Madry arXiv preprint arXiv:2207.02842, 2022 | 36 | 2022 |
Missingness Bias in Model Debugging S Jain, H Salman, E Wong, P Zhang, V Vineet, S Vemprala, A Madry International Conference on Learning Representations, 2021 | 30 | 2021 |