Dehao Chen
Dehao Chen
Verified email at
Cited by
Cited by
Gpipe: Efficient training of giant neural networks using pipeline parallelism
Y Huang, Y Cheng, A Bapna, O Firat, D Chen, M Chen, HJ Lee, J Ngiam, ...
Advances in neural information processing systems 32, 103-112, 2019
MapCG: Writing parallel program portable between CPU and GPU
C Hong, D Chen, W Chen, W Zheng, H Lin
Proceedings of the 19th international conference on Parallel architectures …, 2010
Mlperf training benchmark
P Mattson, C Cheng, C Coleman, G Diamos, P Micikevicius, D Patterson, ...
arXiv preprint arXiv:1910.01500, 2019
Gshard: Scaling giant models with conditional computation and automatic sharding
D Lepikhin, HJ Lee, Y Xu, D Chen, O Firat, Y Huang, M Krikun, N Shazeer, ...
arXiv preprint arXiv:2006.16668, 2020
Lingvo: a modular and scalable framework for sequence-to-sequence modeling
J Shen, P Nguyen, Y Wu, Z Chen, MX Chen, Y Jia, A Kannan, T Sainath, ...
arXiv preprint arXiv:1902.08295, 2019
Image classification at supercomputer scale
C Ying, S Kumar, D Chen, T Wang, Y Cheng
arXiv preprint arXiv:1811.06992, 2018
Taming hardware event samples for FDO compilation
D Chen, N Vachharajani, R Hundt, S Liao, V Ramasamy, P Yuan, W Chen, ...
Proceedings of the 8th annual IEEE/ACM international symposium on Code …, 2010
AutoFDO: Automatic feedback-directed optimization for warehouse-scale applications
D Chen, T Moseley, DX Li
2016 IEEE/ACM International Symposium on Code Generation and Optimization …, 2016
Tree partition based parallel frequent pattern mining on shared memory systems
D Chen, C Lai, W Hu, WG Chen, Y Zhang, W Zheng
Proceedings 20th IEEE International Parallel & Distributed Processing …, 2006
Taming hardware event samples for precise and versatile feedback directed optimizations
D Chen, N Vachharajani, R Hundt, X Li, S Eranian, W Chen, W Zheng
IEEE Transactions on Computers 62 (2), 376-389, 2011
Providing source code level portability between CPU and GPU with MapCG
CT Hong, DH Chen, YB Chen, WG Chen, WM Zheng, HB Lin
Journal of Computer Science and Technology 27 (1), 42-56, 2012
Feedback-directed optimizations in gcc with estimated edge profiles from hardware event sampling
V Ramasamy, P Yuan, D Chen, R Hundt
Scale mlperf-0.6 models on google tpu-v3 pods
S Kumar, V Bitorff, D Chen, C Chou, B Hechtman, HJ Lee, N Kumar, ...
arXiv preprint arXiv:1909.09756, 2019
Compile-time feedback-directed optimizations using estimated edge profiles from hardware-event sampling
R Hundt, V Ramasamy, D Chen
US Patent 8,387,026, 2013
CUDA-Zero: a framework for porting shared memory GPU applications to multi-GPUs
D Chen, W Chen, W Zheng
Science China Information Sciences 55 (3), 663-676, 2012
Automatic cross-replica sharding of weight update in data-parallel training
Y Xu, HJ Lee, D Chen, H Choi, B Hechtman, S Wang
arXiv preprint arXiv:2004.13336, 2020
Methods for handling inlined functions using sample profiles
V Ramasamy, D Chen, P Yuan
US Patent 8,423,980, 2013
Hardware counted profile-guided optimization
B Wicht, RA Vitillo, D Chen, D Levinthal
arXiv preprint arXiv:1411.6361, 2014
Using an inline stack to improve performance of an applications binary
D Chen, XD Li
US Patent 9,009,691, 2015
GSPMD: General and Scalable Parallelization for ML Computation Graphs
Y Xu, HJ Lee, D Chen, B Hechtman, Y Huang, R Joshi, M Krikun, ...
arXiv preprint arXiv:2105.04663, 2021
The system can't perform the operation now. Try again later.
Articles 1–20