Dehao Chen
Dehao Chen
Verified email at google.com
Title
Cited by
Cited by
Year
Gpipe: Efficient training of giant neural networks using pipeline parallelism
Y Huang, Y Cheng, A Bapna, O Firat, D Chen, M Chen, HJ Lee, J Ngiam, ...
Advances in neural information processing systems 32, 103-112, 2019
5422019
MapCG: Writing parallel program portable between CPU and GPU
C Hong, D Chen, W Chen, W Zheng, H Lin
Proceedings of the 19th international conference on Parallel architectures …, 2010
2092010
Mlperf training benchmark
P Mattson, C Cheng, C Coleman, G Diamos, P Micikevicius, D Patterson, ...
arXiv preprint arXiv:1910.01500, 2019
1162019
Gshard: Scaling giant models with conditional computation and automatic sharding
D Lepikhin, HJ Lee, Y Xu, D Chen, O Firat, Y Huang, M Krikun, N Shazeer, ...
arXiv preprint arXiv:2006.16668, 2020
1152020
Lingvo: a modular and scalable framework for sequence-to-sequence modeling
J Shen, P Nguyen, Y Wu, Z Chen, MX Chen, Y Jia, A Kannan, T Sainath, ...
arXiv preprint arXiv:1902.08295, 2019
1042019
Image classification at supercomputer scale
C Ying, S Kumar, D Chen, T Wang, Y Cheng
arXiv preprint arXiv:1811.06992, 2018
962018
Taming hardware event samples for FDO compilation
D Chen, N Vachharajani, R Hundt, S Liao, V Ramasamy, P Yuan, W Chen, ...
Proceedings of the 8th annual IEEE/ACM international symposium on Code …, 2010
742010
AutoFDO: Automatic feedback-directed optimization for warehouse-scale applications
D Chen, T Moseley, DX Li
2016 IEEE/ACM International Symposium on Code Generation and Optimization …, 2016
572016
Tree partition based parallel frequent pattern mining on shared memory systems
D Chen, C Lai, W Hu, WG Chen, Y Zhang, W Zheng
Proceedings 20th IEEE International Parallel & Distributed Processing …, 2006
492006
Taming hardware event samples for precise and versatile feedback directed optimizations
D Chen, N Vachharajani, R Hundt, X Li, S Eranian, W Chen, W Zheng
IEEE Transactions on Computers 62 (2), 376-389, 2011
352011
Providing source code level portability between CPU and GPU with MapCG
CT Hong, DH Chen, YB Chen, WG Chen, WM Zheng, HB Lin
Journal of Computer Science and Technology 27 (1), 42-56, 2012
192012
Feedback-directed optimizations in gcc with estimated edge profiles from hardware event sampling
V Ramasamy, P Yuan, D Chen, R Hundt
192008
Scale mlperf-0.6 models on google tpu-v3 pods
S Kumar, V Bitorff, D Chen, C Chou, B Hechtman, HJ Lee, N Kumar, ...
arXiv preprint arXiv:1909.09756, 2019
182019
Compile-time feedback-directed optimizations using estimated edge profiles from hardware-event sampling
R Hundt, V Ramasamy, D Chen
US Patent 8,387,026, 2013
152013
CUDA-Zero: a framework for porting shared memory GPU applications to multi-GPUs
D Chen, W Chen, W Zheng
Science China Information Sciences 55 (3), 663-676, 2012
142012
Automatic cross-replica sharding of weight update in data-parallel training
Y Xu, HJ Lee, D Chen, H Choi, B Hechtman, S Wang
arXiv preprint arXiv:2004.13336, 2020
102020
Methods for handling inlined functions using sample profiles
V Ramasamy, D Chen, P Yuan
US Patent 8,423,980, 2013
92013
Hardware counted profile-guided optimization
B Wicht, RA Vitillo, D Chen, D Levinthal
arXiv preprint arXiv:1411.6361, 2014
52014
Using an inline stack to improve performance of an applications binary
D Chen, XD Li
US Patent 9,009,691, 2015
32015
GSPMD: General and Scalable Parallelization for ML Computation Graphs
Y Xu, HJ Lee, D Chen, B Hechtman, Y Huang, R Joshi, M Krikun, ...
arXiv preprint arXiv:2105.04663, 2021
22021
The system can't perform the operation now. Try again later.
Articles 1–20