SoundNet: Learning Sound Representations from Unlabeled Video Y Aytar, C Vondrick, A Torralba Neural Information Processing Systems, 2016 | 1288 | 2016 |
Learning cross-modal embeddings for cooking recipes and food images A Salvador, N Hynes, Y Aytar, J Marin, F Ofli, I Weber, A Torralba Proceedings of the IEEE conference on computer vision and pattern …, 2017 | 695 | 2017 |
With a little help from my friends: Nearest-neighbor contrastive learning of visual representations D Dwibedi, Y Aytar, J Tompson, P Sermanet, A Zisserman Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 473 | 2021 |
Tabula rasa: Model transfer for object category detection Y Aytar, A Zisserman 2011 international conference on computer vision, 2252-2259, 2011 | 450 | 2011 |
Temporal cycle-consistency learning D Dwibedi, Y Aytar, J Tompson, P Sermanet, A Zisserman Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2019 | 321 | 2019 |
Playing hard exploration games by watching youtube Y Aytar, T Pfaff, D Budden, T Paine, Z Wang, N De Freitas Advances in neural information processing systems 31, 2018 | 316 | 2018 |
Learning aligned cross-modal representations from weakly aligned data L Castrejon, Y Aytar, C Vondrick, H Pirsiavash, A Torralba Proceedings of the IEEE conference on computer vision and pattern …, 2016 | 201 | 2016 |
See, hear, and read: Deep aligned representations Y Aytar, C Vondrick, A Torralba arXiv preprint arXiv:1706.00932, 2017 | 169 | 2017 |
Cross-modal scene networks Y Aytar, L Castrejon, C Vondrick, H Pirsiavash, A Torralba IEEE transactions on pattern analysis and machine intelligence 40 (10), 2303 …, 2017 | 153 | 2017 |
Scaling data-driven robotics with reward sketching and batch reinforcement learning S Cabi, SG Colmenarejo, A Novikov, K Konyushkova, S Reed, R Jeong, ... arXiv preprint arXiv:1909.12200, 2019 | 142 | 2019 |
Sickle cell detection using a smartphone SM Knowlton, I Sencan, Y Aytar, J Khoory, MM Heeney, IC Ghiran, ... Scientific reports 5 (1), 15022, 2015 | 139 | 2015 |
How transferable are CNN-based features for age and gender classification? G Ozbulak, Y Aytar, HK Ekenel 2016 International Conference of the Biometrics Special Interest Group …, 2016 | 137 | 2016 |
Counting out time: Class agnostic video repetition counting in the wild D Dwibedi, Y Aytar, J Tompson, P Sermanet, A Zisserman Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 135 | 2020 |
Tap-vid: A benchmark for tracking any point in a video C Doersch, A Gupta, L Markeeva, A Recasens, L Smaira, Y Aytar, ... Advances in Neural Information Processing Systems 35, 13610-13626, 2022 | 103 | 2022 |
Face-to-BMI: Using computer vision to infer body mass index on social media E Kocabey, M Camurcu, F Ofli, Y Aytar, J Marin, A Torralba, I Weber Proceedings of the International AAAI Conference on Web and Social Media 11 …, 2017 | 101 | 2017 |
Utilizing semantic word similarity measures for video retrieval Y Aytar, M Shah, J Luo 2008 IEEE Conference on Computer Vision and Pattern Recognition, 1-8, 2008 | 89 | 2008 |
Is saki# delicious? the food perception gap on instagram and its relation to health F Ofli, Y Aytar, I Weber, R Al Hammouri, A Torralba Proceedings of the 26th International Conference on World Wide Web, 509-518, 2017 | 82 | 2017 |
Tapir: Tracking any point with per-frame initialization and temporal refinement C Doersch, Y Yang, M Vecerik, D Gokay, A Gupta, Y Aytar, J Carreira, ... Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 81 | 2023 |
Offline learning from demonstrations and unlabeled experience K Zolna, A Novikov, K Konyushkova, C Gulcehre, Z Wang, Y Aytar, ... arXiv preprint arXiv:2011.13885, 2020 | 71 | 2020 |
Robocat: A self-improving foundation agent for robotic manipulation K Bousmalis, G Vezzani, D Rao, C Devin, AX Lee, M Bauza, T Davchev, ... arXiv preprint arXiv:2306.11706, 2023 | 69 | 2023 |