Yuchen Li 0006, Rui Kong, Xinran Chen, Chengzhe Zhang, Jiamin Chen, Cheng Deng 0001, Xinyu Ma, Haojie Zhang, Tianhao Peng 0002, Hengyi Cai, Shuaiqiang Wang, Jiashu Zhao, Yongqi Zhang, Haoyi Xiong, Jimmy Xiangji Huang, Lei Chen 0002, Jun Wang 0012, Dawei Yin 0001. Probe-and-Fetch: Dynamic KV Cache Pruning for Accelerated Long-Context Inference in Web-Scale AI Search. In Hakim Hacid, Yoelle Maarek, Francesco Bonchi, Ido Guy, Emine Yilmaz, editors, Proceedings of the ACM Web Conference 2026, WWW 2026, Dubai, United Arab Emirates, originally scheduled for April 13-17, 2026, rescheduled for June 29 - July 3, 2026. pages 8127-8137, ACM, 2026. [doi]
Abstract is missing.