GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference

Ali Hadi Zadeh, Isak Edo, Omar Mohamed Awad, Andreas Moshovos. GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference. In 53rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2020, Athens, Greece, October 17-21, 2020. pages 811-824, IEEE, 2020. [doi]

Abstract

Abstract is missing.