Improving Computation and Memory Efficiency for Real-world Transformer Inference on GPUs

Jiangsu Du, Jiazhi Jiang, Jiang Zheng, Hongbin Zhang, Dan Huang, Yutong Lu. Improving Computation and Memory Efficiency for Real-world Transformer Inference on GPUs. TACO, 20(4), December 2023. [doi]

Authors

Jiangsu Du

This author has not been identified. Look up 'Jiangsu Du' in Google

Jiazhi Jiang

This author has not been identified. Look up 'Jiazhi Jiang' in Google

Jiang Zheng

This author has not been identified. Look up 'Jiang Zheng' in Google

Hongbin Zhang

This author has not been identified. Look up 'Hongbin Zhang' in Google

Dan Huang

This author has not been identified. Look up 'Dan Huang' in Google

Yutong Lu

This author has not been identified. Look up 'Yutong Lu' in Google