Minsu Kim

Cited by

	All	Since 2019
Citations	353	353
h-index	11	11
i10-index	12	12

200

100

150

20212022202320244 47 198 104

Co-authors

Yong Man RoProfessor of Electrical Engineering, KAISTVerified email at kaist.ac.kr
Joanna HongPh.D. at Korea Advanced Institute of Science and TechnologyVerified email at kaist.ac.kr
Jeongsoo ChoiKAISTVerified email at kaist.ac.kr
Se Jin ParkKorea Advanced Institute of Science and Technology (KAIST)Verified email at kaist.ac.kr
Jeong Hun YeoKorea Advanced Institute of Science and TechnologyVerified email at kaist.ac.kr
Junho KimKorea Advanced Institute of Science and Technology (KAIST)Verified email at kaist.ac.kr
Shinji WatanabeCarnegie Mellon UniversityVerified email at cmu.edu
Hong Joo, LeeTechnical University of MunichVerified email at tum.de
Hyung-Il KimSenior Researcher, ETRIVerified email at etri.re.kr
Sangmin LeeUniversity of Illinois Urbana-ChampaignVerified email at illinois.edu
Soumi MaitiCarnegie Mellon UniversityVerified email at andrew.cmu.edu
Jung Uk KimAssistant Professor of Computer Science, Kyung Hee UniversityVerified email at khu.ac.kr
Siddhant AroraGraduate Student, Carnegie Mellon UniversityVerified email at andrew.cmu.edu
Xuankai ChangCarnegie Mellon University, StudentVerified email at andrew.cmu.edu
Jee-weon JungCarnegie Mellon UniversityVerified email at ieee.org
Dahun KimResearch Scientist, Google DeepMindVerified email at google.com
Sungjune ParkElectrical Engineering, Korea Advanced Institute of Science and Technology (KAIST)Verified email at kaist.ac.kr

Minsu Kim

Korea Advanced Institute of Science and Technology

Verified email at kaist.ac.kr - Homepage

Multimodal Learning Audio-Visual Speech Processing Multimodal Language Processing


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Synctalkface: Talking face generation with precise lip-syncing via audio-lip memory SJ Park, M Kim, J Hong, J Choi, YM Ro Proceedings of the AAAI Conference on Artificial Intelligence 36 (2), 2062-2070, 2022	52	2022
Distinguishing homophenes using multi-head visual-audio memory for lip reading M Kim, JH Yeo, YM Ro Proceedings of the AAAI conference on artificial intelligence 36 (1), 1174-1182, 2022	41	2022
Multi-modality associative bridging through memory: Speech sound recollected from face video M Kim, J Hong, SJ Park, YM Ro Proceedings of the IEEE/CVF International Conference on Computer Vision, 296-306, 2021	37	2021
Lip to speech synthesis with visual context attentional gan M Kim, J Hong, YM Ro Advances in Neural Information Processing Systems 34, 2758-2770, 2021	36	2021
Cromm-vsr: Cross-modal memory augmented visual speech recognition M Kim, J Hong, SJ Park, YM Ro IEEE Transactions on Multimedia 24, 4342-4355, 2021	26	2021
Speaker-adaptive lip reading with user-dependent padding M Kim, H Kim, YM Ro European Conference on Computer Vision, 576-593, 2022	18	2022
Speech reconstruction with reminiscent sound via visual voice memory J Hong, M Kim, SJ Park, YM Ro IEEE/ACM Transactions on Audio, Speech, and Language Processing 29, 3654-3667, 2021	18	2021
Prompt tuning of deep neural networks for speaker-adaptive visual speech recognition M Kim, HI Kim, YM Ro arXiv preprint arXiv:2302.08102, 2023	17	2023
Visual context-driven audio feature enhancement for robust end-to-end audio-visual speech recognition J Hong, M Kim, D Yoo, YM Ro INTERSPEECH 2022, 2022	17	2022
Watch or listen: Robust audio-visual speech recognition with visual corruption modeling and reliability scoring J Hong, M Kim, J Choi, YM Ro Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023	15	2023
Lip-to-speech synthesis in the wild with multi-task learning M Kim, J Hong, YM Ro ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023	13	2023
Intelligible Lip-to-Speech Synthesis with Speech Units J Choi, M Kim, YM Ro INTERSPEECH 2023, 2023	10	2023
Multi-temporal lip-audio memory for visual speech recognition JH Yeo, M Kim, YM Ro ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023	8	2023
Interpretation of lesional detection via counterfactual generation J Kim, M Kim, YM Ro 2021 IEEE International Conference on Image Processing (ICIP), 96-100, 2021	8	2021
Many-to-many spoken language translation via unified speech and text representation learning with unit-to-unit translation M Kim, J Choi, D Kim, YM Ro arXiv preprint arXiv:2308.01831, 2023	7	2023
Akvsr: Audio knowledge empowered visual speech recognition by compressing audio knowledge of a pretrained model JH Yeo, M Kim, J Choi, DH Kim, YM Ro IEEE Transactions on Multimedia, 2024	6	2024
Lip reading for low-resource languages by learning and combining general speech knowledge and language-specific knowledge M Kim, JH Yeo, J Choi, YM Ro Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023	6	2023
Visagesyntalk: Unseen speaker video-to-speech synthesis via speech-visage feature selection J Hong, M Kim, YM Ro European Conference on Computer Vision, 452-468, 2022	5	2022
Towards practical and efficient image-to-speech captioning with vision-language pre-training and multi-modal tokens M Kim, J Choi, S Maiti, JH Yeo, S Watanabe, YM Ro ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024	3	2024
Robust video facial authentication with unsupervised mode disentanglement M Kim, HJ Lee, S Lee, YM Ro 2020 IEEE International Conference on Image Processing (ICIP), 1321-1325, 2020	3	2020

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors