Follow
Andy Zou
Andy Zou
Verified email at andrew.cmu.edu - Homepage
Title
Cited by
Cited by
Year
Measuring Massive Multitask Language Understanding
D Hendrycks, C Burns, S Basart, A Zou, M Mazeika, D Song, J Steinhardt
ICLR, 2020
4542020
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models
A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ...
TMLR, 2022
4392022
Scaling Out-of-Distribution Detection for Real-World Settings
D Hendrycks, S Basart, M Mazeika, A Zou, J Kwon, M Mostajabi, ...
ICML, 2021
2222021
Universal and Transferable Adversarial Attacks on Aligned Language Models
A Zou, Z Wang, JZ Kolter, M Fredrikson
arXiv preprint arXiv:2307.15043, 2023
772023
PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures
D Hendrycks, A Zou, M Mazeika, L Tang, D Song, J Steinhardt
CVPR, 2021
642021
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
A Pan, CJ Shern, A Zou, N Li, S Basart, T Woodside, J Ng, H Zhang, ...
ICML, 2023
312023
What Would Jiminy Cricket Do? Towards Agents That Behave Morally
M Mazeika, A Zou, S Patel, C Zhu, J Navarro, D Song, B Li, J Steinhardt, ...
NeurIPS, 2021
30*2021
Forecasting Future World Events with Neural Networks
A Zou, T Xiao, R Jia, J Kwon, M Mazeika, R Li, D Song, J Steinhardt, ...
NeurIPS, 2022
92022
Representation Engineering: A Top-Down Approach to AI Transparency
A Zou, L Phan, S Chen, J Campbell, P Guo, R Ren, A Pan, X Yin, ...
arXiv preprint arXiv:2310.01405, 2023
72023
How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
M Mazeika, E Tang, A Zou, S Basart, D Song, D Forsyth, J Steinhardt, ...
NeurIPS, 2022
42022
Unlocking Deterministic Robustness Certification on ImageNet
K Hu, A Zou, Z Wang, K Leino, M Fredrikson
NeurIPS, 2023
3*2023
How Hard is Trojan Detection in DNNs? Fooling Detectors With Evasive Trojans
M Mazeika, A Zou, A Arora, P Pleskov, D Song, D Hendrycks, B Li, ...
2022
The Trojan Detection Challenge
M Mazeika, D Hendrycks, H Li, X Xu, S Hough, A Zou, A Rajabi, Q Yao, ...
NeurIPS 2022 Competition Track, 279-291, 2022
2022
The system can't perform the operation now. Try again later.
Articles 1–13