Follow
Sun Peng
Sun Peng
Shanghai AI Laboratory
Verified email at pjlab.org.cn
Title
Cited by
Cited by
Year
Gradientflow: Optimizing network performance for large-scale distributed dnn training
P Sun, Y Wen, R Han, W Feng, S Yan
IEEE Transactions on Big Data 8 (2), 495-507, 2019
101*2019
A chunk caching location and searching scheme in content centric networking
Y Li, T Lin, H Tang, P Sun
2012 IEEE International Conference on Communications (ICC), 2655-2659, 2012
932012
Characterization and prediction of deep learning workloads in large-scale GPU datacenters
Q Hu, P Sun, S Yan, Y Wen, T Zhang
Proceedings of the International Conference for High Performance Computing …, 2021
822021
Towards distributed machine learning in shared clusters: A dynamically-partitioned approach
P Sun, Y Wen, NBD Ta, S Yan
2017 IEEE International Conference on Smart Computing (SMARTCOMP), 1-6, 2017
442017
Chronus: A Novel Deadline-aware Scheduler for Deep Learning Training Jobs
W Gao, Z Ye, P Sun, Y Wen, T Zhang
Proceedings of the ACM Symposium on Cloud Computing, 609-623, 2021
252021
Deep Learning Workload Scheduling in GPU Datacenters: A Survey
Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo, T Zhang, Y Wen
ACM Computing Surveys 56 (6), 1-38, 2024
23*2024
Cloud3DView: An interactive tool for cloud data center operations
J Yin, P Sun, Y Wen, H Gong, M Liu, X Li, H You, J Gao, C Lin
Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM, 499-500, 2013
192013
Timed dataflow: Reducing communication overhead for distributed machine learning systems
P Sun, Y Wen, TNB Duong, S Yan
2016 IEEE 22nd International Conference on Parallel and Distributed Systems …, 2016
182016
GraphMP: An Efficient Semi-External-Memory Big Graph Processing System on a Single Machine
P Sun, Y Wen, TNB Duong, X Xiao
2017 IEEE 23rd International Conference on Parallel and Distributed Systems …, 2017
162017
Elan: Towards Generic and Efficient Elastic Training for Deep Learning
L Xie, J Zhai, B Wu, Y Wang, X Zhang, P Sun, S Yan
2020 IEEE 40th International Conference on Distributed Computing Systems …, 2020
142020
Astraea: A fair deep learning scheduler for multi-tenant gpu clusters
Z Ye, P Sun, W Gao, T Zhang, X Wang, S Yan, Y Luo
IEEE Transactions on Parallel and Distributed Systems 33 (11), 2781-2793, 2021
112021
Graphh: High performance big graph analytics in small clusters
P Sun, Y Wen, TNB Duong, X Xiao
2017 IEEE International Conference on Cluster Computing (CLUSTER), 256-266, 2017
112017
ModelCI-e: Enabling Continual Learning in Deep Learning Serving Systems
Y Huang, H Zhang, Y Wen, P Sun, NBD TA
arXiv preprint arXiv:2106.03122, 2021
102021
Lucid: A Non-intrusive, Scalable and Interpretable Scheduler for Deep Learning Training Jobs
Q Hu, M Zhang, P Sun, Y Wen, T Zhang
Proceedings of the 28th ACM International Conference on Architectural …, 2023
92023
GraphMP: I/O-efficient big graph analytics on a single commodity machine
P Sun, Y Wen, TNB Duong, X Xiao
IEEE Transactions on Big Data 6 (4), 816-829, 2019
92019
CREATE: CoRrelation enhanced trAffic maTrix estimation in data center networks
Z Hu, Y Qiao, J Luo, P Sun, Y Wen
2014 IFIP Networking Conference, 1-9, 2014
82014
Metaflow: A scalable metadata lookup service for distributed file systems in data centers
P Sun, Y Wen, DNB Ta, H Xie
IEEE Transactions on Big Data 4 (2), 203-216, 2016
62016
Boosting Distributed Full-graph GNN Training with Asynchronous One-bit Communication
M Zhang, Q Hu, P Sun, Y Wen, T Zhang
arXiv preprint arXiv:2303.01277, 2023
52023
InternLM2 Technical Report
Z Cai, M Cao, H Chen, K Chen, K Chen, X Chen, X Chen, Z Chen, Z Chen, ...
arXiv preprint arXiv:2403.17297, 2024
42024
Primo: Practical {Learning-Augmented} Systems with Interpretable Models
Q Hu, H Nori, P Sun, Y Wen, T Zhang
2022 USENIX Annual Technical Conference (USENIX ATC 22), 519-538, 2022
42022
The system can't perform the operation now. Try again later.
Articles 1–20