Attention is all you need A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... Advances in neural information processing systems, 5998-6008, 2017 | 16671 | 2017 |
A decomposable attention model for natural language inference AP Parikh, O Täckström, D Das, J Uszkoreit arXiv preprint arXiv:1606.01933, 2016 | 836 | 2016 |
Self-attention with relative position representations P Shaw, J Uszkoreit, A Vaswani arXiv preprint arXiv:1803.02155, 2018 | 423 | 2018 |
Tensor2tensor for neural machine translation A Vaswani, S Bengio, E Brevdo, F Chollet, AN Gomez, S Gouws, L Jones, ... arXiv preprint arXiv:1803.07416, 2018 | 318 | 2018 |
Natural questions: a benchmark for question answering research T Kwiatkowski, J Palomaki, O Redfield, M Collins, A Parikh, C Alberti, ... Transactions of the Association for Computational Linguistics 7, 453-466, 2019 | 307 | 2019 |
Image transformer N Parmar, A Vaswani, J Uszkoreit, Ł Kaiser, N Shazeer, A Ku, D Tran arXiv preprint arXiv:1802.05751, 2018 | 247 | 2018 |
Universal transformers M Dehghani, S Gouws, O Vinyals, J Uszkoreit, Ł Kaiser arXiv preprint arXiv:1807.03819, 2018 | 240 | 2018 |
Cross-lingual word clusters for direct transfer of linguistic structure O Täckström, R McDonald, J Uszkoreit The 2012 Conference of the North American Chapter of the Association for …, 2012 | 222 | 2012 |
One model to learn them all L Kaiser, AN Gomez, N Shazeer, A Vaswani, N Parmar, L Jones, ... arXiv preprint arXiv:1706.05137, 2017 | 218 | 2017 |
Large scale parallel document mining for machine translation J Uszkoreit, J Ponte, A Popat, M Dubiner Proceedings of the 23rd International Conference on Computational …, 2010 | 147 | 2010 |
Lattice-based minimum error rate training for statistical machine translation W Macherey, F Och, I Thayer, J Uszkoreit | 122 | 2008 |
Coarse-to-fine question answering for long documents E Choi, D Hewlett, J Uszkoreit, I Polosukhin, A Lacoste, J Berant Proceedings of the 55th Annual Meeting of the Association for Computational …, 2017 | 115 | 2017 |
Distributed word clustering for large scale class-based language modeling in machine translation J Uszkoreit, T Brants Proceedings of ACL-08: HLT, 755-762, 2008 | 107 | 2008 |
Music transformer: Generating music with long-term structure CZA Huang, A Vaswani, J Uszkoreit, I Simon, C Hawthorne, N Shazeer, ... International Conference on Learning Representations, 2018 | 96 | 2018 |
Insertion transformer: Flexible sequence generation via insertion operations M Stern, W Chan, J Kiros, J Uszkoreit arXiv preprint arXiv:1902.03249, 2019 | 90 | 2019 |
ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez Advances in neural information processing systems, 5998-6008, 2017 | 79* | 2017 |
“Poetic” Statistical Machine Translation: Rhyme and Meter D Genzel, J Uszkoreit, F Och | 65 | 2010 |
Inducing sentence structure from parallel corpora for reordering J DeNero, J Uszkoreit Proceedings of the 2011 Conference on Empirical Methods in Natural Language …, 2011 | 63 | 2011 |
Attention is all you need. CoRR abs/1706.03762 (2017) A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... arXiv preprint arXiv:1706.03762, 2017 | 58 | 2017 |
Music transformer CZA Huang, A Vaswani, J Uszkoreit, N Shazeer, I Simon, C Hawthorne, ... arXiv preprint arXiv:1809.04281, 2018 | 54 | 2018 |