Follow
Shengen Yan
Title
Cited by
Cited by
Year
Deep image: Scaling up image recognition
R Wu, S Yan, Y Shan, Q Dang, G Sun
arXiv preprint arXiv:1501.02876, 2015
5342015
Evaluating fast algorithms for convolutional neural networks on FPGAs
L Lu, Y Liang, Q Xiao, S Yan
2017 IEEE 25th annual international symposium on field-programmable custom …, 2017
2852017
Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs
Q Xiao, Y Liang, L Lu, S Yan, YW Tai
Proceedings of the 54th Annual Design Automation Conference 2017, 1-6, 2017
2262017
yaSpMV: Yet another SpMV framework on GPUs
S Yan, C Li, Y Zhang, H Zhou
Acm Sigplan Notices 49 (8), 107-118, 2014
1852014
Evaluating fast algorithms for convolutional neural networks on FPGAs
Y Liang, L Lu, Q Xiao, S Yan
IEEE Transactions on Computer-Aided Design of Integrated Circuits and …, 2019
1502019
StreamScan: fast scan algorithms for GPUs without global barrier synchronization
S Yan, G Long, Y Zhang
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of …, 2013
1232013
Characterization and prediction of deep learning workloads in large-scale gpu datacenters
Q Hu, P Sun, S Yan, Y Wen, T Zhang
Proceedings of the International Conference for High Performance Computing …, 2021
1162021
Optimizing network performance for distributed dnn training on gpu clusters: Imagenet/alexnet training in 1.5 minutes
P Sun, W Feng, R Han, S Yan, Y Wen
arXiv preprint arXiv:1902.06855, 2019
792019
A coordinated tiling and batching framework for efficient GEMM on GPUs
X Li, Y Liang, S Yan, L Jia, Y Li
Proceedings of the 24th symposium on principles and practice of parallel …, 2019
632019
AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction
S Zheng, R Chen, A Wei, Y Jin, Q Han, L Lu, B Wu, X Li, S Yan, Y Liang
Proceedings of the 49th Annual International Symposium on Computer …, 2022
502022
Towards distributed machine learning in shared clusters: A dynamically-partitioned approach
P Sun, Y Wen, NBD Ta, S Yan
2017 IEEE International Conference on Smart Computing (SMARTCOMP), 1-6, 2017
502017
GPURoofline: a model for guiding performance optimizations on GPUs
H Jia, Y Zhang, G Long, J Xu, S Yan, Y Li
Euro-Par 2012 Parallel Processing: 18th International Conference, Euro-Par …, 2012
442012
Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs
C Li, Y Yang, H Dai, S Yan, F Mueller, H Zhou
2014 IEEE International Symposium on Performance Analysis of Systems and …, 2014
422014
Diesel: A dataset-based distributed storage and caching system for large-scale deep learning training
L Wang, S Ye, B Yang, Y Lu, H Zhang, S Yan, Q Luo
Proceedings of the 49th International Conference on Parallel Processing, 1-11, 2020
352020
Gradientflow: Optimizing network performance for large-scale distributed dnn training
P Sun, Y Wen, R Han, W Feng, S Yan
IEEE Transactions on Big Data 8 (2), 495-507, 2019
322019
A survey on efficient inference for large language models
Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou, L Wang, Z Yuan, X Li, ...
arXiv preprint arXiv:2404.14294, 2024
262024
Parallelization and performance optimization on face detection algorithm with OpenCL: A case study
W Wang, Y Zhang, S Yan, Y Zhang, H Jia
Tsinghua Science and Technology 17 (3), 287-295, 2012
242012
Enabling efficient fast convolution algorithms on GPUs via MegaKernels
L Jia, Y Liang, X Li, L Lu, S Yan
IEEE Transactions on Computers 69 (7), 986-997, 2020
232020
Timed dataflow: Reducing communication overhead for distributed machine learning systems
P Sun, Y Wen, TNB Duong, S Yan
2016 IEEE 22nd International Conference on Parallel and Distributed Systems …, 2016
202016
Chimera: An analytical optimizing framework for effective compute-intensive operators fusion
S Zheng, S Chen, P Song, R Chen, X Li, S Yan, D Lin, J Leng, Y Liang
2023 IEEE International Symposium on High-Performance Computer Architecture …, 2023
182023
The system can't perform the operation now. Try again later.
Articles 1–20