Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge

Xuan Shen, Peiyan Dong, Lei Lu, Zhenglun Kong, Zhengang Li, Ming Lin, Chao Wu, Yanzhi Wang. Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge. In Michael J. Wooldridge, Jennifer G. Dy, Sriraam Natarajan, editors, Thirty-Eigth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2014, February 20-27, 2024, Vancouver, Canada. pages 18944-18951, AAAI Press, 2024. [doi]

Authors

Xuan Shen

This author has not been identified. Look up 'Xuan Shen' in Google

Peiyan Dong

This author has not been identified. Look up 'Peiyan Dong' in Google

Lei Lu

This author has not been identified. Look up 'Lei Lu' in Google

Zhenglun Kong

This author has not been identified. Look up 'Zhenglun Kong' in Google

Zhengang Li

This author has not been identified. Look up 'Zhengang Li' in Google

Ming Lin

This author has not been identified. Look up 'Ming Lin' in Google

Chao Wu

This author has not been identified. Look up 'Chao Wu' in Google

Yanzhi Wang

This author has not been identified. Look up 'Yanzhi Wang' in Google