A 17-95.6 TOPS/W Deep Learning Inference Accelerator with Per-Vector Scaled 4-bit Quantization for Transformers in 5nm

Ben Keller, Rangharajan Venkatesan, Steve Dai, Stephen G. Tell, Brian Zimmer, William J. Dally, C. Thomas Gray, Brucek Khailany. A 17-95.6 TOPS/W Deep Learning Inference Accelerator with Per-Vector Scaled 4-bit Quantization for Transformers in 5nm. In IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits 2022), Honolulu, HI, USA, June 12-17, 2022. pages 16-17, IEEE, 2022. [doi]

Abstract

Abstract is missing.