Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems

Jack FitzGerald, Shankar Ananthakrishnan, Konstantine Arkoudas, Davide Bernardi, Abhishek Bhagia, Claudio Delli Bovi, Jin Cao, Rakesh Chada, Amit Chauhan, Luoxin Chen, Anurag Dwarakanath, Satyam Dwivedi, Turan Gojayev, Karthik Gopalakrishnan, Thomas Gueudre, Dilek Hakkani-Tur, Wael Hamza, Jonathan J. Hüser, Kevin Martin Jose, Haidar Khan, Beiye Liu, Jianhua Lu, Alessandro Manzotti, Pradeep Natarajan, Karolina Owczarzak, Gokmen Oz, Enrico Palumbo, Charith Peris, Chandana Satya Prakash, Stephen Rawls, Andy Rosenbaum, Anjali Shenoy, Saleh Soltan, Mukund Harakere Sridhar, Lizhen Tan, Fabian Triefenbach, Pan Wei, Haiyang Yu, Shuai Zheng, Gökhan Tür, Prem Natarajan. Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems. In Aidong Zhang, Huzefa Rangwala, editors, KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022. pages 2893-2902, ACM, 2022. [doi]

@inproceedings{FitzGeraldAABBB22,
  title = {Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems},
  author = {Jack FitzGerald and Shankar Ananthakrishnan and Konstantine Arkoudas and Davide Bernardi and Abhishek Bhagia and Claudio Delli Bovi and Jin Cao and Rakesh Chada and Amit Chauhan and Luoxin Chen and Anurag Dwarakanath and Satyam Dwivedi and Turan Gojayev and Karthik Gopalakrishnan and Thomas Gueudre and Dilek Hakkani-Tur and Wael Hamza and Jonathan J. Hüser and Kevin Martin Jose and Haidar Khan and Beiye Liu and Jianhua Lu and Alessandro Manzotti and Pradeep Natarajan and Karolina Owczarzak and Gokmen Oz and Enrico Palumbo and Charith Peris and Chandana Satya Prakash and Stephen Rawls and Andy Rosenbaum and Anjali Shenoy and Saleh Soltan and Mukund Harakere Sridhar and Lizhen Tan and Fabian Triefenbach and Pan Wei and Haiyang Yu and Shuai Zheng and Gökhan Tür and Prem Natarajan},
  year = {2022},
  doi = {10.1145/3534678.3539173},
  url = {https://doi.org/10.1145/3534678.3539173},
  researchr = {https://researchr.org/publication/FitzGeraldAABBB22},
  cites = {0},
  citedby = {0},
  pages = {2893-2902},
  booktitle = {KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022},
  editor = {Aidong Zhang and Huzefa Rangwala},
  publisher = {ACM},
  isbn = {978-1-4503-9385-0},
}