Mimetic Initialization of Self-Attention Layers

Asher Trockman, J. Zico Kolter. Mimetic Initialization of Self-Attention Layers. In Andreas Krause 0001, Emma Brunskill, KyungHyun Cho, Barbara Engelhardt, Sivan Sabato, Jonathan Scarlett, editors, International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA. Volume 202 of Proceedings of Machine Learning Research, pages 34456-34468, PMLR, 2023. [doi]

@inproceedings{TrockmanK23-0,
  title = {Mimetic Initialization of Self-Attention Layers},
  author = {Asher Trockman and J. Zico Kolter},
  year = {2023},
  url = {https://proceedings.mlr.press/v202/trockman23a.html},
  researchr = {https://researchr.org/publication/TrockmanK23-0},
  cites = {0},
  citedby = {0},
  pages = {34456-34468},
  booktitle = {International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA},
  editor = {Andreas Krause 0001 and Emma Brunskill and KyungHyun Cho and Barbara Engelhardt and Sivan Sabato and Jonathan Scarlett},
  volume = {202},
  series = {Proceedings of Machine Learning Research},
  publisher = {PMLR},
}