Keigo Shibata, Kazuki Yano, Ryosuke Takahashi, Jaesung Lee, Wataru Ikeda, Jun Suzuki 0001. Suppressing Final Layer Hidden State Jumps in Transformer Pretraining. In Vera Demberg, Kentaro Inui, LluĂs Marquez, editors, Findings of the Association for Computational Linguistics: EACL 2026, Rabat, Morocco, March 24-29, 2026. pages 1236-1262, Association for Computational Linguistics, 2026. [doi]
Abstract is missing.