Follow
Jun Liu
Title
Cited by
Cited by
Year
Flashdecoding++: Faster large language model inference on gpus
K Hong, G Dai, J Xu, Q Mao, X Li, J Liu, K Chen, H Dong, Y Wang
arXiv preprint arXiv:2311.01282, 2023
172023
A unified FPGA virtualization framework for general-purpose deep neural networks in the cloud
S Zeng, G Dai, H Sun, J Liu, S Li, G Ge, K Zhong, K Guo, Y Wang, H Yang
ACM Transactions on Reconfigurable Technology and Systems (TRETS) 15 (3), 1-31, 2021
52021
Optimizing Graph-based Approximate Nearest Neighbor Search: Stronger and Smarter
J Liu, Z Zhu, J Hu, H Sun, L Liu, L Liu, G Dai, H Yang, Y Wang
2022 23rd IEEE International Conference on Mobile Data Management (MDM), 179-184, 2022
32022
FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGA
S Zeng, J Liu, G Dai, X Yang, T Fu, H Wang, W Ma, H Sun, S Li, Z Huang, ...
arXiv preprint arXiv:2401.03868, 2024
12024
Enabling Fast 2-bit LLM on GPUs: Memory Alignment, Sparse Outlier, and Asynchronous Dequantization
J Li, S Li, J Xu, S Huang, Y Lian, J Liu, Y Wang, G Dai
arXiv preprint arXiv:2311.16442, 2023
2023
DF-GAS: a Distributed FPGA-as-a-Service Architecture towards Billion-Scale Graph-based Approximate Nearest Neighbor Search
S Zeng, Z Zhu, J Liu, H Zhang, G Dai, Z Zhou, S Li, X Ning, Y Xie, H Yang, ...
Proceedings of the 56th Annual IEEE/ACM International Symposium on …, 2023
2023
TSTC: Two-level Sparsity Tensor Core Enabling both Algorithm Flexibility and Hardware Efficiency
J Liu, G Dai, H Xia, L Guo, X Shi, J Xu, H Yang, Y Wang
2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), 1-9, 2023
2023
Processing-In-Hierarchical-Memory Architecture for Billion-Scale Approximate Nearest Neighbor Search
Z Zhu, J Liu, G Dai, S Zeng, B Li, H Yang, Y Wang
2023 60th ACM/IEEE Design Automation Conference (DAC), 1-6, 2023
2023
The system can't perform the operation now. Try again later.
Articles 1–8