The following publications are possibly variants of this publication:
- OMB-UM: Design, Implementation, and Evaluation of CUDA Unified Memory Aware MPI BenchmarksKarthik Vadambacheri Manian, Ching-Hsiang Chu, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni. sc 2019: 82-92 [doi]
- CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU ClustersKhaled Hamidouche, Akshay Venkatesh, Ammar Ahmad Awan, Hari Subramoni, Ching-Hsiang Chu, Dhabaleswar K. Panda. pc, 58:27-36, 2016. [doi]
- CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPCKhaled Hamidouche, Ammar Ahmad Awan, Akshay Venkatesh, Dhabaleswar K. Panda. hipc 2016: 52-61 [doi]
- Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance EvaluationAmmar Ahmad Awan, Jeroen BĂ©dorf, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda. ccgrid 2019: 498-507 [doi]
- OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN TrainingAmmar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Xiaoyi Lu, Dhabaleswar K. Panda. hipc 2018: 143-152 [doi]
- Machine-agnostic and Communication-aware Designs for MPI on Emerging ArchitecturesJahanzeb Maqbool Hashmi, Shulei Xu, Bharath Ramesh 0005, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. D. K. Panda. ipps 2020: 32-41 [doi]
- FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architecturesJahanzeb Maqbool Hashmi, Ching-Hsiang Chu, Sourav Chakraborty 0003, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda. jpdc, 144:1-13, 2020. [doi]