Towards deep learning models resistant to adversarial attacks A Madry, A Makelov, L Schmidt, D Tsipras, A Vladu arXiv preprint arXiv:1706.06083, 2017 | 12411 | 2017 |
Towards deep learning models resistant to adversarial attacks A Mądry, A Makelov, L Schmidt, D Tsipras, A Vladu stat 1050 (9), 2017 | 31 | 2017 |
Rethinking backdoor attacks A Khaddaj, G Leclerc, A Makelov, K Georgiev, H Salman, A Ilyas, A Madry International Conference on Machine Learning, 16216-16236, 2023 | 8 | 2023 |
Is this the subspace you are looking for? an interpretability illusion for subspace activation patching A Makelov, G Lange, A Geiger, N Nanda The Twelfth International Conference on Learning Representations, 2023 | 7 | 2023 |
Expansion in lifts of graphs AA Makelov | 7 | 2015 |
Towards principled evaluations of sparse autoencoders for interpretability and control A Makelov, G Lange, N Nanda arXiv preprint arXiv:2405.08366, 2024 | 5 | 2024 |
A Systematic Comparison of Sparse Autoencoder Variants for Model Steering on the IOI Task A Makelov ICML 2024 Workshop on Mechanistic Interpretability, 0 | | |
mandala: Compositional Memoization for Simple & Power-ful Scientific Data Management A Makelov | | |
Backdoor or Feature? A New Perspective on Data Poisoning A Khaddaj, G Leclerc, A Makelov, K Georgiev, A Ilyas, H Salman, A Madry | | |