GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference - researchr publication

researchr

You are not signed in
Sign in
Sign up

Ali Hadi Zadeh, Isak Edo, Omar Mohamed Awad, Andreas Moshovos. GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference. In 53rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2020, Athens, Greece, October 17-21, 2020. pages 811-824, IEEE, 2020. [doi]

Abstract is missing.

runs on WebDSL