Improving Computation and Memory Efficiency for Real-world Transformer Inference on GPUs

Jiangsu Du, Jiazhi Jiang, Jiang Zheng, Hongbin Zhang, Dan Huang, Yutong Lu. Improving Computation and Memory Efficiency for Real-world Transformer Inference on GPUs. TACO, 20(4), December 2023. [doi]

References

No references recorded for this publication.

Cited by

No citations of this publication recorded.