Deep Graph Infomax. P Velickovic, W Fedus, WL Hamilton, P Liņ, Y Bengio, RD Hjelm ICLR (Poster) 2 (3), 4, 2019 | 1335 | 2019 |
Palm: Scaling language modeling with pathways A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ... arXiv preprint arXiv:2204.02311, 2022 | 947 | 2022 |
Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity W Fedus, B Zoph, N Shazeer The Journal of Machine Learning Research 23 (1), 5232-5270, 2022 | 731 | 2022 |
MaskGAN: Better Text Generation via Filling in the ______ W Fedus, I Goodfellow, AM Dai International Conference on Learning Representations (ICLR 2018), 2018 | 534 | 2018 |
In silico labeling: Predicting fluorescent labels in unlabeled images SF Eric Christiansen, Samuel J. Yang, D. Michael Ando, Ashkan Javaherian ... Cell, 2018 | 491 | 2018 |
Deep graph infomax P Veličković, W Fedus, WL Hamilton, P Liņ, Y Bengio, RD Hjelm arXiv preprint arXiv:1809.10341, 2018 | 402 | 2018 |
Emergent abilities of large language models J Wei, Y Tay, R Bommasani, C Raffel, B Zoph, S Borgeaud, D Yogatama, ... arXiv preprint arXiv:2206.07682, 2022 | 252 | 2022 |
The case for a directional dark matter detector and the status of current experimental efforts S Ahlen, N Afshordi, JBR Battat, J Billard, N Bozorgnia, S Burgos, ... International Journal of Modern Physics A 25 (01), 1-51, 2010 | 239 | 2010 |
Scaling instruction-finetuned language models HW Chung, L Hou, S Longpre, B Zoph, Y Tay, W Fedus, E Li, X Wang, ... arXiv preprint arXiv:2210.11416, 2022 | 233 | 2022 |
Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step W Fedus, M Rosca, B Lakshminarayanan, AM Dai, S Mohamed, ... International Conference on Learning Representations (ICLR 2018), 2017 | 224 | 2017 |
Revisiting resnets: Improved training and scaling strategies I Bello, W Fedus, X Du, ED Cubuk, A Srinivas, TY Lin, J Shlens, B Zoph Advances in Neural Information Processing Systems 34, 22614-22627, 2021 | 211 | 2021 |
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... arXiv preprint arXiv:2206.04615, 2022 | 185 | 2022 |
Language GANs Falling Short M Caccia, L Caccia, W Fedus, H Larochelle, J Pineau, L Charlin International Conference on Learning Representations (ICLR 2020), 2018 | 185 | 2018 |
Revisiting fundamentals of experience replay W Fedus, P Ramachandran, R Agarwal, Y Bengio, H Larochelle, ... International Conference on Machine Learning, 3061-3071, 2020 | 158 | 2020 |
Glam: Efficient scaling of language models with mixture-of-experts N Du, Y Huang, AM Dai, S Tong, D Lepikhin, Y Xu, M Krikun, Y Zhou, ... International Conference on Machine Learning, 5547-5569, 2022 | 123 | 2022 |
First dark matter search results from a surface run of the 10-L DMTPC directional dark matter detector S Ahlen, JBR Battat, T Caldwell, C Deaconu, D Dujmic, W Fedus, P Fisher, ... Physics Letters B 695 (1-4), 124-129, 2011 | 103 | 2011 |
On bonus-based exploration methods in the arcade learning environment AA Taiga, W Fedus, MC Machado, A Courville, MG Bellemare arXiv preprint arXiv:2109.11052, 2021 | 88* | 2021 |
Hyperbolic discounting and learning over multiple horizons W Fedus, C Gelada, Y Bengio, MG Bellemare, H Larochelle Reinforcement Learning and Decision Making (RLDM 2019), 2019 | 87 | 2019 |
Do transformer modifications transfer across implementations and applications? S Narang, HW Chung, Y Tay, W Fedus, T Fevry, M Matena, K Malkan, ... arXiv preprint arXiv:2102.11972, 2021 | 81* | 2021 |
Recall Traces: Backtracking Models for Efficient Reinforcement Learning A Goyal, P Brakel, W Fedus, T Lillicrap, S Levine, H Larochelle, Y Bengio International Conference on Learning Representations (ICLR 2019), 2018 | 67 | 2018 |