Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching - researchr publication

researchr

You are not signed in
Sign in
Sign up

Sungmin Yun 0001, Kwanhee Kyung, Juhwan Cho, Jaewan Choi, Jongmin Kim 0007, Byeongho Kim, Sukhan Lee 0002, Kyomin Sohn, Jung Ho Ahn. Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching. In 57th IEEE/ACM International Symposium on Microarchitecture, MICRO 2024, Austin, TX, USA, November 2-6, 2024. pages 1429-1443, IEEE, 2024. [doi]

Abstract is missing.

runs on WebDSL