Efficient Large Scale Language Modeling with Mixtures of Experts

Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giridharan Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeffrey Wang, Luke Zettlemoyer, Mona T. Diab, Zornitsa Kozareva, Veselin Stoyanov. Efficient Large Scale Language Modeling with Mixtures of Experts. In Yoav Goldberg, Zornitsa Kozareva, Yue Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11. pages 11699-11732, Association for Computational Linguistics, 2022. [doi]

@inproceedings{ArtetxeBGMOSLDI22,
  title = {Efficient Large Scale Language Modeling with Mixtures of Experts},
  author = {Mikel Artetxe and Shruti Bhosale and Naman Goyal and Todor Mihaylov and Myle Ott and Sam Shleifer and Xi Victoria Lin and Jingfei Du and Srinivasan Iyer and Ramakanth Pasunuru and Giridharan Anantharaman and Xian Li and Shuohui Chen and Halil Akin and Mandeep Baines and Louis Martin and Xing Zhou and Punit Singh Koura and Brian O'Horo and Jeffrey Wang and Luke Zettlemoyer and Mona T. Diab and Zornitsa Kozareva and Veselin Stoyanov},
  year = {2022},
  url = {https://aclanthology.org/2022.emnlp-main.804},
  researchr = {https://researchr.org/publication/ArtetxeBGMOSLDI22},
  cites = {0},
  citedby = {0},
  pages = {11699-11732},
  booktitle = {Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11},
  editor = {Yoav Goldberg and Zornitsa Kozareva and Yue Zhang},
  publisher = {Association for Computational Linguistics},
}