Arsha Nagrani

Cited by

	All	Since 2019
Citations	10329	10158
h-index	32	32
i10-index	47	47

3300

1650

825

2475

2018201920202021202220232024135 479 1234 1629 2414 3214 1165

Public access

View all

20 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Andrew ZissermanUniversity of OxfordVerified email at robots.ox.ac.uk
Cordelia SchmidResearch director INRIA Verified email at inria.fr
Joon Son ChungKAISTVerified email at kaist.ac.kr
Chen SunAssistant Professor, Brown UniversityVerified email at brown.edu
Andrea VedaldiUniversity of OxfordVerified email at robots.ox.ac.uk
Dima DamenProfessor, University of Bristol and Google DeepMindVerified email at bristol.ac.uk
Evangelos KazakosCzech Technical University in PragueVerified email at cvut.cz
Rahul SukthankarGoogle ResearchVerified email at google.com
Samuel AlbanieAssistant Professor, University of CambridgeVerified email at cam.ac.uk

Arsha Nagrani

Research Scientist, Google

Verified email at google.com - Homepage

Machine learning Computer Vision Speech Technology Deep Learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Voxceleb: a large-scale speaker identification dataset A Nagrani, JS Chung, A Zisserman arXiv preprint arXiv:1706.08612, 2017	2392	2017
Voxceleb2: Deep speaker recognition JS Chung, A Nagrani, A Zisserman arXiv preprint arXiv:1806.05622, 2018	2234	2018
Frozen in time: A joint video and image encoder for end-to-end retrieval M Bain, A Nagrani, G Varol, A Zisserman Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021	719	2021
Voxceleb: Large-scale speaker verification in the wild A Nagrani, JS Chung, W Xie, A Zisserman Computer Speech & Language 60, 101027, 2020	627	2020
Attention bottlenecks for multimodal fusion A Nagrani, S Yang, A Arnab, A Jansen, C Schmid, C Sun Advances in neural information processing systems 34, 14200-14213, 2021	448	2021
Use what you have: Video retrieval using representations from collaborative experts Y Liu, S Albanie, A Nagrani, A Zisserman arXiv preprint arXiv:1907.13487, 2019	402	2019
Utterance-level aggregation for speaker recognition in the wild W Xie, A Nagrani, JS Chung, A Zisserman ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019	388	2019
Epic-fusion: Audio-visual temporal binding for egocentric action recognition E Kazakos, A Nagrani, A Zisserman, D Damen Proceedings of the IEEE/CVF international conference on computer vision …, 2019	355	2019
Emotion recognition in speech using cross-modal transfer in the wild S Albanie, A Nagrani, A Vedaldi, A Zisserman Proceedings of the 26th ACM international conference on Multimedia, 292-301, 2018	299	2018
Seeing voices and hearing faces: Cross-modal biometric matching A Nagrani, S Albanie, A Zisserman Proceedings of the IEEE conference on computer vision and pattern …, 2018	230	2018
Chimpanzee face recognition from videos in the wild using deep learning D Schofield, A Nagrani, A Zisserman, M Hayashi, T Matsuzawa, D Biro, ... Science advances 5 (9), eaaw0736, 2019	185	2019
Localizing visual sounds the hard way H Chen, W Xie, T Afouras, A Nagrani, A Vedaldi, A Zisserman Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021	150	2021
Learnable pins: Cross-modal embeddings for person identity A Nagrani, S Albanie, A Zisserman Proceedings of the European Conference on Computer Vision (ECCV), 71-88, 2018	139	2018
End-to-end generative pretraining for multimodal video captioning PH Seo, A Nagrani, A Arnab, C Schmid Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022	134	2022
Spot the conversation: speaker diarisation in the wild JS Chung, J Huh, A Nagrani, T Afouras, A Zisserman arXiv preprint arXiv:2007.01216, 2020	134	2020
Cough against covid: Evidence of covid-19 signature in cough sounds P Bagad, A Dalmia, J Doshi, A Nagrani, P Bhamare, A Mahale, S Rane, ... arXiv preprint arXiv:2009.08790, 2020	130	2020
Disentangled speech embeddings using cross-modal self-supervision A Nagrani, JS Chung, S Albanie, A Zisserman ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020	99	2020
Vid2seq: Large-scale pretraining of a visual language model for dense video captioning A Yang, A Nagrani, PH Seo, A Miech, J Pont-Tuset, I Laptev, J Sivic, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023	95	2023
Pali-x: On scaling up a multilingual vision and language model X Chen, J Djolonga, P Padlewski, B Mustafa, S Changpinyo, J Wu, ... arXiv preprint arXiv:2305.18565, 2023	83	2023
Voxsrc 2020: The second voxceleb speaker recognition challenge A Nagrani, JS Chung, J Huh, A Brown, E Coto, W Xie, M McLaren, ... arXiv preprint arXiv:2012.06867, 2020	79	2020

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors