Sledovat
Aleksandar Makelov
Aleksandar Makelov
Independent
E-mailová adresa ověřena na: mit.edu - Domovská stránka
Název
Citace
Citace
Rok
Towards deep learning models resistant to adversarial attacks
A Madry, A Makelov, L Schmidt, D Tsipras, A Vladu
arXiv preprint arXiv:1706.06083, 2017
124112017
Towards deep learning models resistant to adversarial attacks
A Mądry, A Makelov, L Schmidt, D Tsipras, A Vladu
stat 1050 (9), 2017
312017
Rethinking backdoor attacks
A Khaddaj, G Leclerc, A Makelov, K Georgiev, H Salman, A Ilyas, A Madry
International Conference on Machine Learning, 16216-16236, 2023
82023
Is this the subspace you are looking for? an interpretability illusion for subspace activation patching
A Makelov, G Lange, A Geiger, N Nanda
The Twelfth International Conference on Learning Representations, 2023
72023
Expansion in lifts of graphs
AA Makelov
72015
Towards principled evaluations of sparse autoencoders for interpretability and control
A Makelov, G Lange, N Nanda
arXiv preprint arXiv:2405.08366, 2024
52024
A Systematic Comparison of Sparse Autoencoder Variants for Model Steering on the IOI Task
A Makelov
ICML 2024 Workshop on Mechanistic Interpretability, 0
mandala: Compositional Memoization for Simple & Power-ful Scientific Data Management
A Makelov
Backdoor or Feature? A New Perspective on Data Poisoning
A Khaddaj, G Leclerc, A Makelov, K Georgiev, A Ilyas, H Salman, A Madry
Systém momentálně nemůže danou operaci provést. Zkuste to znovu později.
Články 1–9