Follow
Ethan Perez
Ethan Perez
Anthropic
Verified email at anthropic.com - Homepage
Title
Cited by
Cited by
Year
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
P Lewis, E Perez, A Piktus, F Petroni, V Karpukhin, N Goyal, H Küttler, ...
NeurIPS 2020, 2020
125542020
FiLM: Visual Reasoning with a General Conditioning Layer
E Perez, F Strub, H De Vries, V Dumoulin, A Courville
AAAI 2018, 2018
3116*2018
Constitutional AI: Harmlessness from AI Feedback
Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ...
arXiv preprint arXiv:2212.08073, 2022
22232022
Red teaming language models with language models
E Perez, S Huang, F Song, T Cai, R Ring, J Aslanides, A Glaese, ...
EMNLP 2022, 2022
9212022
Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned
D Ganguli, L Lovitt, J Kernion, A Askell, Y Bai, S Kadavath, B Mann, ...
arXiv preprint arXiv:2209.07858, 2022
7882022
ELI5: Long Form Question Answering
A Fan, Y Jernite*, E Perez*, D Grangier, J Weston, M Auli
Association for Computational Linguistics (ACL) 2019, 2019
7562019
Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting
M Turpin, J Michael, E Perez, S Bowman
Advances in Neural Information Processing Systems 36, 74952-74965, 2023
6612023
Language models (mostly) know what they know
S Kadavath, T Conerly, A Askell, T Henighan, D Drain, E Perez, ...
arXiv preprint arXiv:2207.05221, 2022
6052022
True Few-Shot Learning with Language Models
E Perez, D Kiela, K Cho
NeurIPS 2021, 2021
5552021
Discovering language model behaviors with model-written evaluations
E Perez, S Ringer, K Lukosiute, K Nguyen, E Chen, S Heiner, C Pettit, ...
Findings of the association for computational linguistics: ACL 2023, 13387-13434, 2023
4852023
Towards understanding sycophancy in language models
M Sharma, M Tong, T Korbak, D Duvenaud, A Askell, SR Bowman, ...
ICLR, 2023
4702023
Supervised multimodal bitransformers for classifying images and text
D Kiela, S Bhooshan, H Firooz, E Perez, D Testuggine
arXiv preprint arXiv:1909.02950, 2019
3652019
Pretraining language models with human preferences
T Korbak, K Shi, A Chen, R Bhalerao, CL Buckley, J Phang, SR Bowman, ...
ICML 2023, 2023
2632023
Sleeper agents: Training deceptive llms that persist through safety training
E Hubinger, C Denison, J Mu, M Lambert, M Tong, M MacDiarmid, ...
arXiv preprint arXiv:2401.05566, 2024
2572024
Feature-wise transformations
V Dumoulin, E Perez, N Schucher, F Strub, H Vries, A Courville, Y Bengio
Distill 3 (7), e11, 2018
245*2018
Studying large language model generalization with influence functions
R Grosse, J Bae, C Anil, N Elhage, A Tamkin, A Tajdini, B Steiner, D Li, ...
arXiv preprint arXiv:2308.03296, 2023
2322023
Measuring faithfulness in chain-of-thought reasoning
T Lanham, A Chen, A Radhakrishnan, B Steiner, C Denison, ...
arXiv preprint arXiv:2307.13702, 2023
2252023
Many-shot jailbreaking
C Anil, E Durmus, N Panickssery, M Sharma, J Benton, S Kundu, J Batson, ...
Advances in Neural Information Processing Systems 37, 129696-129742, 2024
2132024
The capacity for moral self-correction in large language models
D Ganguli, A Askell, N Schiefer, TI Liao, K Lukošiūtė, A Chen, A Goldie, ...
arXiv preprint arXiv:2302.07459, 2023
2122023
Unsupervised Question Decomposition for Question Answering
E Perez, P Lewis, W Yih, K Cho, D Kiela
EMNLP 2020, 2020
2012020
The system can't perform the operation now. Try again later.
Articles 1–20