BAQET: BRAM-aware Quantization for Efficient Transformer Inference via Stream-based Architecture on an FPGA

LingChi Yang, Chi-Jui Chen, Trung Le, Bo-Cheng Lai, Scott Hauck, Shih-Chieh Hsu. BAQET: BRAM-aware Quantization for Efficient Transformer Inference via Stream-based Architecture on an FPGA. In Andrew Putnam, Jing Li 0073, editors, Proceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA 2025, Monterey, CA, USA, 27 February 2025 - 1 March 2025. pages 51, ACM, 2025. [doi]

Abstract

Abstract is missing.