Flash-LLM: Enabling Low-Cost and Highly-Efficient Large Generative Model Inference With Unstructured Sparsity

Haojun Xia, Zhen Zheng, Yuchao Li, Donglin Zhuang, Zhongzhu Zhou, Xiafei Qiu, Yong Li, Wei Lin 0016, Shuaiwen Leon Song. Flash-LLM: Enabling Low-Cost and Highly-Efficient Large Generative Model Inference With Unstructured Sparsity. PVLDB, 17(2):211-224, 2023. [doi]

Authors

Haojun Xia

This author has not been identified. Look up 'Haojun Xia' in Google

Zhen Zheng

This author has not been identified. Look up 'Zhen Zheng' in Google

Yuchao Li

This author has not been identified. Look up 'Yuchao Li' in Google

Donglin Zhuang

This author has not been identified. Look up 'Donglin Zhuang' in Google

Zhongzhu Zhou

This author has not been identified. Look up 'Zhongzhu Zhou' in Google

Xiafei Qiu

This author has not been identified. Look up 'Xiafei Qiu' in Google

Yong Li

This author has not been identified. Look up 'Yong Li' in Google

Wei Lin 0016

This author has not been identified. Look up 'Wei Lin 0016' in Google

Shuaiwen Leon Song

This author has not been identified. Look up 'Shuaiwen Leon Song' in Google