Focused Transformer: Contrastive Training for Context Scaling

Szymon Tworkowski, Konrad Staniszewski, Mikolaj Pacek, Yuhuai Wu, Henryk Michalewski, Piotr Milos. Focused Transformer: Contrastive Training for Context Scaling. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, Sergey Levine, editors, Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. 2023. [doi]

Authors

Szymon Tworkowski

This author has not been identified. Look up 'Szymon Tworkowski' in Google

Konrad Staniszewski

This author has not been identified. Look up 'Konrad Staniszewski' in Google

Mikolaj Pacek

This author has not been identified. Look up 'Mikolaj Pacek' in Google

Yuhuai Wu

This author has not been identified. Look up 'Yuhuai Wu' in Google

Henryk Michalewski

This author has not been identified. Look up 'Henryk Michalewski' in Google

Piotr Milos

This author has not been identified. Look up 'Piotr Milos' in Google