Pali: A jointly-scaled multilingual language-image model X Chen, X Wang, S Changpinyo, AJ Piergiovanni, P Padlewski, D Salz, ... ICLR 2023 (Oral), 2022 | 408 | 2022 |
LiT: Zero-Shot Transfer with Locked-image Text Tuning X Zhai, X Wang, B Mustafa, A Steiner, D Keysers, A Kolesnikov, L Beyer CVPR 2022, 2021 | 402 | 2021 |
Measuring compositional generalization: A comprehensive method on realistic data D Keysers, N Schärli, N Scales, H Buisman, D Furrer, S Kashubin, ... ICLR 2020, 2019 | 337 | 2019 |
Simple Open-Vocabulary Object Detection with Vision Transformers M Minderer, A Gritsenko, A Stone, M Neumann, D Weissenborn, ... ECCV 2022, 2022 | 278* | 2022 |
Scaling vision transformers to 22 billion parameters M Dehghani, J Djolonga, B Mustafa, P Padlewski, J Heek, J Gilmer, ... ICML 2023 (Oral), 2023 | 269 | 2023 |
Pali-x: On scaling up a multilingual vision and language model X Chen, J Djolonga, P Padlewski, B Mustafa, S Changpinyo, J Wu, ... CVPR 2024, 2023 | 85 | 2023 |
Pali-3 vision language models: Smaller, faster, stronger X Chen, X Wang, L Beyer, A Kolesnikov, J Wu, P Voigtlaender, B Mustafa, ... arXiv preprint arXiv:2310.09199, 2023 | 24 | 2023 |
Three Towers: Flexible Contrastive Learning with Pretrained Image Models J Kossen, M Collier, B Mustafa, X Wang, X Zhai, L Beyer, A Steiner, ... NeuIPS 2023, 2023 | 3 | 2023 |
A study of autoregressive decoders for multi-tasking in computer vision L Beyer, B Wan, G Madan, F Pavetic, A Steiner, A Kolesnikov, AS Pinto, ... arXiv preprint arXiv:2303.17376, 2023 | 3 | 2023 |
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning? I Alabdulmohsin, X Wang, A Steiner, P Goyal, A D'Amour, X Zhai ICLR 2024, 2024 | | 2024 |