Zhihang Yuan

Cited by

	All	Since 2019
Citations	733	690
h-index	12	12
i10-index	15	14

280

140

210

201720182019202020212022202320247 34 53 47 68 74 171 277

Public access

View all

8 articles

3 articles

available

not available

Based on funding mandates

Co-authors

Guangyu SunSchool of Integrated Circuits, Peking UniversityVerified email at pku.edu.cn
Bingzhe WuTencent AI LabVerified email at pku.edu.cn
Yuzhang ShangIllinois Institute of TechnologyVerified email at hawk.iit.edu
Yijin GuanComputing Technology Lab, Alibaba DAMO AcademyVerified email at alibaba-inc.com
Chenhao XueSchool of Integrated Circuits, Peking UniversityVerified email at pku.edu.cn
Jingsheng Jason CongVolgenau Chair for Engineering Excellence, Computer Science and Electrical Engineering, UniversityVerified email at cs.ucla.edu
Xinggang WangProfessor, Huazhong University of Science and TechnologyVerified email at hust.edu.cn
Wenyu LiuThe University of SydneyVerified email at sydney.edu.au
Yan YanIllinois Institute of TechnologyVerified email at iit.edu
Yiqi ChenPeking UniversityVerified email at pku.edu.cn
Dawei YangFudan UniversityVerified email at fudan.edu.cn
Yizeng Han (韩益增)Alibaba DAMO AcademyVerified email at alibaba-inc.com
Yifan Pu (浦一凡)Department of Automation, Tsinghua UniversityVerified email at mails.tsinghua.edu.cn
Zhen DongPhD & Postdoc at Berkeley AI ResearchVerified email at berkeley.edu
Gao Huang （黄高）Associate Professor, Tsinghua UniversityVerified email at tsinghua.edu.cn
Shiwan ZhaoIndependent Researcher, Research Scientist of IBM Research - China (2000-2020)Verified email at cn.ibm.com
Yuchao YangPeking UniversityVerified email at pku.edu.cn
Zhe ZhouPhD. Candidate of Computer Architecture, Peking UniversityVerified email at pku.edu.cn

Zhihang Yuan

Infini-AI

Verified email at infini-ai.com - Homepage

Efficient AI Deep Learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
FPGA-based accelerator for long short-term memory recurrent neural networks Y Guan, Z Yuan, G Sun, J Cong 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), 629-634, 2017	229	2017
Ptq4vit: Post-training quantization for vision transformers with twin uniform quantization Z Yuan, C Xue, Y Chen, Q Wu, G Sun European conference on computer vision, 191-207, 2022	112*	2022
Post-training quantization on diffusion models Y Shang, Z Yuan, B Xie, B Wu, Y Yan Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023	67	2023
Rptq: Reorder-based post-training quantization for large language models Z Yuan, L Niu, J Liu, W Liu, X Wang, Y Shang, G Sun, Q Wu, J Wu, B Wu arXiv preprint arXiv:2304.01089, 2023	45	2023
Pd-quant: Post-training quantization based on prediction difference metric J Liu, L Niu, Z Yuan, D Yang, X Wang, W Liu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023	42	2023
Reducing overfitting in deep convolutional neural networks using redundancy regularizer B Wu, Z Liu, Z Yuan, G Sun, C Wu Artificial Neural Networks and Machine Learning–ICANN 2017: 26th …, 2017	33	2017
S2DNAS: Transforming static CNN model for dynamic inference via neural architecture search Z Yuan, B Wu, G Sun, Z Liang, S Zhao, W Bi Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23 …, 2020	31	2020
NAS4RRAM: neural network architecture search for inference on RRAM-based accelerators Z Yuan, J Liu, X Li, L Yan, H Chen, B Wu, Y Yang, G Sun Science China Information Sciences 64 (6), 160407, 2021	21	2021
Latency-aware spatial-wise dynamic networks Y Han, Z Yuan, Y Pu, C Xue, S Song, G Sun, G Huang Advances in Neural Information Processing Systems 35, 36845-36857, 2022	20	2022
Pb-llm: Partially binarized large language models Y Shang, Z Yuan, Q Wu, Z Dong arXiv preprint arXiv:2310.00034, 2023	19	2023
A survey on efficient inference for large language models Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou, L Wang, Z Yuan, X Li, ... arXiv preprint arXiv:2404.14294, 2024	13	2024
Llm inference unveiled: Survey and roofline model insights Z Yuan, Y Shang, Y Zhou, Z Dong, C Xue, B Wu, Z Li, Q Gu, YJ Lee, ... arXiv preprint arXiv:2402.16363, 2024	12	2024
Using data compression for optimizing FPGA-based convolutional neural network accelerators Y Guan, N Xu, C Zhang, Z Yuan, J Cong International workshop on advanced parallel processing technologies, 14-26, 2017	12	2017
Latency-aware unified dynamic networks for efficient image recognition Y Han, Z Liu, Z Yuan, Y Pu, C Wang, S Song, G Huang arXiv preprint arXiv:2308.15949, 2023	11	2023
Asvd: Activation-aware singular value decomposition for compressing large language models Z Yuan, Y Shang, Y Song, Q Wu, Y Yan, G Sun arXiv preprint arXiv:2312.05821, 2023	10	2023
Enas4d: Efficient multi-stage cnn architecture search for dynamic inference Z Yuan, X Liu, B Wu, G Sun arXiv preprint arXiv:2009.09182, 2020	7	2020
Crane: mitigating accelerator under-utilization caused by sparsity irregularities in cnns Y Guan, G Sun, Z Yuan, X Li, N Xu, S Chen, J Cong, Y Xie IEEE Transactions on Computers 69 (7), 931-943, 2020	7	2020
Wkvquant: Quantizing weight and key/value cache for large language models gains more Y Yue, Z Yuan, H Duanmu, S Zhou, J Wu, L Nie arXiv preprint arXiv:2402.12065, 2024	6	2024
Reconfigurable ASIC implementation of asynchronous recurrent neural networks S Nelson, SY Kim, J Di, Z Zhou, Z Yuan, G Sun 2021 27th IEEE International Symposium on Asynchronous Circuits and Systems …, 2021	5	2021
Mim4dd: Mutual information maximization for dataset distillation Y Shang, Z Yuan, Y Yan Advances in Neural Information Processing Systems 36, 2024	3	2024

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors