Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations

Bowen Shen, Zheng Lin 0001, Daren Zha, Wei Liu 0005, Jian Luan 0001, Bin Wang 0004, Weiping Wang 0005. Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations. In Lun-Wei Ku, Andre Martins, Vivek Srikumar, editors, Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024. pages 9781-9793, Association for Computational Linguistics, 2024. [doi]

Abstract

Abstract is missing.