Abstract is missing.
- Near LLC versus near main memory processingHossein BiTalebi, Vahid Geraeinejad, Masoumeh Ebrahimi. [doi]
- Accelerating data transfer between host and device using idle GPUYuya Tatsugi, Akira Nukada. [doi]
- Understanding wafer-scale GPU performance using an architectural simulatorChris Thames, Hang Yan, Yifan Sun. [doi]
- Systematically extending a high-level code generator with support for tensor coresLukas Siefke, Bastian Köpcke, Sergei Gorlatch, Michel Steuwer. [doi]
- ScaleServe: a scalable multi-GPU machine learning inference system and benchmarking suiteAli Jahanshahi, Marcus Chow, Daniel Wong 0001. [doi]
- Compiler-assisted scheduling for multi-instance GPUsChris Porter, Chao Chen 0024, Santosh Pande. [doi]