On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers

Tianchu Ji, Shraddhan Jain, Michael Ferdman, Peter A. Milder, H. Andrew Schwartz, Niranjan Balasubramanian. On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers. In Chengqing Zong, Fei Xia, Wenjie Li 0002, Roberto Navigli, editors, Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021. pages 4147-4157, Association for Computational Linguistics, 2021. [doi]

@inproceedings{JiJFMSB21,
  title = {On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers},
  author = {Tianchu Ji and Shraddhan Jain and Michael Ferdman and Peter A. Milder and H. Andrew Schwartz and Niranjan Balasubramanian},
  year = {2021},
  url = {https://aclanthology.org/2021.findings-acl.363},
  researchr = {https://researchr.org/publication/JiJFMSB21},
  cites = {0},
  citedby = {0},
  pages = {4147-4157},
  booktitle = {Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021},
  editor = {Chengqing Zong and Fei Xia and Wenjie Li 0002 and Roberto Navigli},
  publisher = {Association for Computational Linguistics},
  isbn = {978-1-954085-54-1},
}