Shengen Yan

Cited by

	All	Since 2019
Citations	2056	1466
h-index	18	15
i10-index	22	22

340

170

255

2013201420152016201720182019202020212022202320249 28 86 139 126 170 234 232 274 316 321 87

Public access

View all

8 articles

2 articles

available

not available

Based on funding mandates

Co-authors

Yun (Eric) LiangProfessor of EECS, Peking University, ACM Distinguished ScientistVerified email at pku.edu.cn
Yunquan ZhangProfessor of Institute of Computing Technology, CASVerified email at ict.ac.cn
Xiuhong LiPeking UniversityVerified email at pku.edu.cn
Ren WuNovuMind Inc.Verified email at novumind.com
Huiyang ZhouProfessor of North Carolina State UniversityVerified email at ncsu.edu
Gang SunMomentaVerified email at momenta.ai
Weiyan WangHong Kong University of Science & TechnologyVerified email at connect.ust.hk
Yi YangNEC LabsVerified email at nec-labs.com
Hongwen DaiGoogleVerified email at ncsu.edu
Sun PengShanghai Artificial Intelligence LaboratoryVerified email at pjlab.org.cn

Shengen Yan

The Chinese University of HongKong

Verified email at ie.cuhk.edu.hk

Large Scale Deep Learning Heterogeneous Computing


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Deep image: Scaling up image recognition R Wu, S Yan, Y Shan, Q Dang, G Sun arXiv preprint arXiv:1501.02876 7 (8), 4, 2015	509	2015
Evaluating fast algorithms for convolutional neural networks on FPGAs L Lu, Y Liang, Q Xiao, S Yan 2017 IEEE 25th annual international symposium on field-programmable custom …, 2017	282	2017
Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs Q Xiao, Y Liang, L Lu, S Yan, YW Tai Proceedings of the 54th Annual Design Automation Conference 2017, 1-6, 2017	223	2017
yaSpMV: Yet another SpMV framework on GPUs S Yan, C Li, Y Zhang, H Zhou Acm Sigplan Notices 49 (8), 107-118, 2014	177	2014
Evaluating fast algorithms for convolutional neural networks on FPGAs Y Liang, L Lu, Q Xiao, S Yan IEEE Transactions on Computer-Aided Design of Integrated Circuits and …, 2019	147	2019
StreamScan: fast scan algorithms for GPUs without global barrier synchronization S Yan, G Long, Y Zhang Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of …, 2013	116	2013
Characterization and prediction of deep learning workloads in large-scale gpu datacenters Q Hu, P Sun, S Yan, Y Wen, T Zhang Proceedings of the International Conference for High Performance Computing …, 2021	84	2021
Optimizing network performance for distributed dnn training on gpu clusters: Imagenet/alexnet training in 1.5 minutes P Sun, W Feng, R Han, S Yan, Y Wen arXiv preprint arXiv:1902.06855, 2019	78	2019
A coordinated tiling and batching framework for efficient GEMM on GPUs X Li, Y Liang, S Yan, L Jia, Y Li Proceedings of the 24th symposium on principles and practice of parallel …, 2019	53	2019
Towards distributed machine learning in shared clusters: A dynamically-partitioned approach P Sun, Y Wen, NBD Ta, S Yan 2017 IEEE International Conference on Smart Computing (SMARTCOMP), 1-6, 2017	45	2017
GPURoofline: a model for guiding performance optimizations on GPUs H Jia, Y Zhang, G Long, J Xu, S Yan, Y Li Euro-Par 2012 Parallel Processing: 18th International Conference, Euro-Par …, 2012	45	2012
Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs C Li, Y Yang, H Dai, S Yan, F Mueller, H Zhou 2014 IEEE International Symposium on Performance Analysis of Systems and …, 2014	39	2014
AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction S Zheng, R Chen, A Wei, Y Jin, Q Han, L Lu, B Wu, X Li, S Yan, Y Liang Proceedings of the 49th Annual International Symposium on Computer …, 2022	34	2022
Diesel: A dataset-based distributed storage and caching system for large-scale deep learning training L Wang, S Ye, B Yang, Y Lu, H Zhang, S Yan, Q Luo Proceedings of the 49th International Conference on Parallel Processing, 1-11, 2020	27	2020
Gradientflow: Optimizing network performance for large-scale distributed dnn training P Sun, Y Wen, R Han, W Feng, S Yan IEEE Transactions on Big Data 8 (2), 495-507, 2019	24	2019
Parallelization and performance optimization on face detection algorithm with OpenCL: A case study W Wang, Y Zhang, S Yan, Y Zhang, H Jia Tsinghua Science and Technology 17 (3), 287-295, 2012	24	2012
Enabling efficient fast convolution algorithms on GPUs via MegaKernels L Jia, Y Liang, X Li, L Lu, S Yan IEEE Transactions on Computers 69 (7), 986-997, 2020	18	2020
Timed dataflow: Reducing communication overhead for distributed machine learning systems P Sun, Y Wen, TNB Duong, S Yan 2016 IEEE 22nd International Conference on Parallel and Distributed Systems …, 2016	18	2016
Elan: Towards generic and efficient elastic training for deep learning L Xie, J Zhai, B Wu, Y Wang, X Zhang, P Sun, S Yan 2020 IEEE 40th International Conference on Distributed Computing Systems …, 2020	15	2020
A cross-platform SpMV framework on many-core architectures Y Zhang, S Li, S Yan, H Zhou ACM Transactions on Architecture and Code Optimization (TACO) 13 (4), 1-25, 2016	15	2016

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors