Peter Hase

Citace

	Všechny	Od 2019
Citace	1098	1097
h-index	13	13
i10-index	15	15

440

220

110

330

2019202020212022202320243 22 108 237 428 295

Veřejný přístup

Zobrazit všechny

3 články

0 článků

dostupné

nedostupné

Vychází ze zplnomocnění pro financování

Spoluautoři

Mohit BansalParker Distinguished Professor, Computer Science, UNC Chapel HillE-mailová adresa ověřena na: cs.unc.edu
Cynthia RudinProfessor of Computer Science, ECE, Statistics, and Biostatistics & Bioinformatics, Duke UniversityE-mailová adresa ověřena na: cs.duke.edu
Swarnadeep SahaPhD Student, University of North Carolina at Chapel HillE-mailová adresa ověřena na: cs.unc.edu
Shiyue ZhangUNC Chapel HillE-mailová adresa ověřena na: cs.unc.edu
Srini IyerFAIRE-mailová adresa ověřena na: fb.com
Asma GhandehariounResearch Scientist, Google ResearchE-mailová adresa ověřena na: google.com
Been KimGoogle DeepMindE-mailová adresa ověřena na: csail.mit.edu
Zhuofan YingColumbia UniversityE-mailová adresa ověřena na: columbia.edu
Peter ClarkAllen Institute for Artificial Intelligence (AI2)E-mailová adresa ověřena na: allenai.org
Sarah WiegreffeAllen Institute for AI & University of WashingtonE-mailová adresa ověřena na: allenai.org

Sledovat

Peter Hase

PhD Student, University of North Carolina at Chapel Hill

E-mailová adresa ověřena na: cs.unc.edu - Domovská stránka

Interpretable Machine Learning Natural Language Processing


Název Seřadit podle citací Seřadit podle roku Seřadit podle názvu	Citace Citace	Rok
Evaluating explainable AI: Which algorithmic explanations help users predict model behavior? P Hase, M Bansal arXiv preprint arXiv:2005.01831, 2020	256	2020
Open problems and fundamental limitations of reinforcement learning from human feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... arXiv preprint arXiv:2307.15217, 2023	154	2023
Interpretable image recognition with hierarchical prototypes P Hase, C Chen, O Li, C Rudin Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 7 …, 2019	104	2019
Grips: Gradient-free, edit-based instruction search for prompting large language models A Prasad, P Hase, X Zhou, M Bansal arXiv preprint arXiv:2203.07281, 2022	97	2022
Do language models have beliefs? methods for detecting, updating, and visualizing model beliefs P Hase, M Diab, A Celikyilmaz, X Li, Z Kozareva, V Stoyanov, M Bansal, ... arXiv preprint arXiv:2111.13654, 2021	78*	2021
Leakage-adjusted simulatability: Can models generate non-trivial explanations of their behavior in natural language? P Hase, S Zhang, H Xie, M Bansal arXiv preprint arXiv:2010.04119, 2020	76	2020
Fastif: Scalable influence functions for efficient model interpretation and debugging H Guo, NF Rajani, P Hase, M Bansal, C Xiong arXiv preprint arXiv:2012.15781, 2020	73	2020
The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations P Hase, H Xie, M Bansal Advances in Neural Information Processing Systems 34, 2021	64	2021
When can models learn from explanations? a formal framework for understanding the roles of explanation data P Hase, M Bansal arXiv preprint arXiv:2102.02201, 2021	61	2021
Does localization inform editing? surprising differences in causality-based localization vs. knowledge editing in language models P Hase, M Bansal, B Kim, A Ghandeharioun Advances in Neural Information Processing Systems 36, 2024	50	2024
Summarization programs: Interpretable abstractive summarization with neural modular trees S Saha, S Zhang, P Hase, M Bansal arXiv preprint arXiv:2209.10492, 2022	15	2022
Can Language Models Teach? Teacher Explanations Improve Student Performance via Personalization S Saha, P Hase, M Bansal Advances in Neural Information Processing Systems 36, 2024	13*	2024
Low-cost algorithmic recourse for users with uncertain cost functions P Yadav, P Hase, M Bansal arXiv preprint arXiv:2111.01235, 2021	13	2021
Can sensitive information be deleted from llms? objectives for defending against extraction attacks V Patil, P Hase, M Bansal arXiv preprint arXiv:2309.17410, 2023	12	2023
Rethinking Machine Unlearning for Large Language Models S Liu, Y Yao, J Jia, S Casper, N Baracaldo, P Hase, X Xu, Y Yao, H Li, ... arXiv preprint arXiv:2402.08787, 2024	10	2024
Visfis: Visual feature importance supervision with right-for-the-right-reason objectives Z Ying, P Hase, M Bansal Advances in Neural Information Processing Systems 35, 17057-17072, 2022	9	2022
Are hard examples also harder to explain? a study with human and model-generated explanations S Saha, P Hase, N Rajani, M Bansal arXiv preprint arXiv:2211.07517, 2022	7	2022
Shall i compare thee to a machine-written sonnet? an approach to algorithmic sonnet generation J Benhardt, P Hase, L Zhu, C Rudin arXiv preprint arXiv:1811.05067, 2018	5	2018
The unreasonable effectiveness of easy training data for hard tasks P Hase, M Bansal, P Clark, S Wiegreffe arXiv preprint arXiv:2401.06751, 2024	1	2024
Foundational Challenges in Assuring Alignment and Safety of Large Language Models U Anwar, A Saparov, J Rando, D Paleka, M Turpin, P Hase, ES Lubana, ... arXiv preprint arXiv:2404.09932, 2024		2024

Systém momentálně nemůže danou operaci provést. Zkuste to znovu později.

Články 1–20

Citace za rok

Duplicitní citace

Sloučené citace

Přidat spoluautorySpoluautoři

Sledovat

Citace

Spoluautoři