Weijie Liu, Shengwei Li, Zhiquan Lai, Keshi Ge, Qiaoling Chen, Peng Sun, Dongsheng Li 0001, Kai Lu 0001. AdaCheck: An Adaptive Checkpointing System for Efficient LLM Training with Redundancy Utilization. In André Brinkmann, Philip Shilane, editors, 24th USENIX Conference on File and Storage Technologies, FAST 2026, Santa Clara, CA, USA, February 24-26, 2026. pages 271-289, USENIX Association, 2026. [doi]
Abstract is missing.