Sungmin Yun 0001, Kwanhee Kyung, Juhwan Cho, Jaewan Choi, Jongmin Kim 0007, Byeongho Kim, Sukhan Lee 0002, Kyomin Sohn, Jung Ho Ahn. Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching. In 57th IEEE/ACM International Symposium on Microarchitecture, MICRO 2024, Austin, TX, USA, November 2-6, 2024. pages 1429-1443, IEEE, 2024. [doi]
Abstract is missing.