Mimetic Initialization of Self-Attention Layers

Asher Trockman, J. Zico Kolter. Mimetic Initialization of Self-Attention Layers. In Andreas Krause 0001, Emma Brunskill, KyungHyun Cho, Barbara Engelhardt, Sivan Sabato, Jonathan Scarlett, editors, International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA. Volume 202 of Proceedings of Machine Learning Research, pages 34456-34468, PMLR, 2023. [doi]

Authors

Asher Trockman

This author has not been identified. Look up 'Asher Trockman' in Google

J. Zico Kolter

This author has not been identified. Look up 'J. Zico Kolter' in Google