Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors

Sameer Deshmukh, Rio Yokota, George Bosilca. Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors. ACM Transactions on Mathematical Software, 49(3), September 2023. [doi]

Abstract

Abstract is missing.