Lessons from the bleeding edge: large-scale production inference of LLMs

Yubin Kim 0001, Arthur Maciejewicz, Brandon Beveridge. Lessons from the bleeding edge: large-scale production inference of LLMs. In Yubin Kim 0001, Tracy Holloway King, Aditya Chichani, Pallavi Gudipati, Andrew Trotman, editors, Proceedings of the ACM SIGIR Workshop on eCommerce 2025 co-located with the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2025), Padua, Italy, July 17, 2025. Volume 4123 of CEUR Workshop Proceedings, CEUR-WS.org, 2025. [doi]

Abstract

Abstract is missing.