Sameer Deshmukh, Rio Yokota, George Bosilca. Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors. ACM Transactions on Mathematical Software, 49(3), September 2023. [doi]
Abstract is missing.