Abstract is missing.
- Split-Knit Convolution: Enabling Dense Evaluation of Transpose and Dilated Convolutions on GPUsArjun Menon Vadakkeveedu, Debabrata Mandal, Pradeep Ramachandran, Nitin Chandrachoodan. 1-10 [doi]
- Low-latency Mini-batch GNN Inference on CPU-FPGA Heterogeneous PlatformBingyi Zhang, Hanqing Zeng, Viktor K. Prasanna. 11-21 [doi]
- Accelerating Broadcast Communication with GPU Compression for Deep Learning WorkloadsQinghua Zhou, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda 0001. 22-31 [doi]
- AccDP: Accelerated Data-Parallel Distributed DNN Training for Modern GPU-Based HPC ClustersNawras Alnaasan, Arpan Jain, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda 0001. 32-41 [doi]
- Joint Partitioning and Sampling Algorithm for Scaling Graph Neural NetworkManohar Lal Das, Vishwesh Jatala, Gagan Raj Gupta. 42-47 [doi]
- Building a Performance Model for Deep Learning Recommendation Model Training on GPUsZhongyi Lin, Louis Feng, Ehsan K. Ardestani, Jaewon Lee, John Lundell, Changkyu Kim, Arun Kejariwal, John D. Owens. 48-58 [doi]
- Accelerating Prefix Scan with in-network computing on Intel PIUMAKartik Lakhotia, Fabrizio Petrini, Rajopgal Kannan, Viktor K. Prasanna. 59-68 [doi]
- memwalkd : Accelerating Key-value stores using Page Table WalkersRavi Shreyas Anupindi, Swaroop Kotni, Arkaprava Basu. 69-74 [doi]
- Energy Consumption Evaluation of Optane DC Persistent Memory for Indexing Data StructuresManolis Katsaragakis, Christos Baloukas, Lazaros Papadopoulos, Verena Kantere, Francky Catthoor, Dimitrios Soudris. 75-84 [doi]
- LDT: Lightweight Dirty Tracking of Memory Pages for x86 SystemsRohit Singh, K. P. Arun 0002, Debadatta Mishra. 85-94 [doi]
- Designing Efficient Pipelined Communication Schemes using Compression in MPI LibrariesBharath Ramesh 0005, Qinghua Zhou, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda 0001. 95-99 [doi]
- Efficient Personalized and Non-Personalized Alltoall Communication for Modern Multi-HCA GPU-Based ClustersKaushik Kandadi Suresh, Akshay Paniraja Guptha, Benjamin Michalowicz, Bharath Ramesh 0005, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda 0001. 100-104 [doi]
- High-Performance Truss Analytics in ArkoudaZhihui Du, Joseph Patchett, Oliver Alvarado Rodriguez, Fuhuan Li, David A. Bader. 105-114 [doi]
- Parallel Vertex Color Update on Large Dynamic NetworksArindam Khanda, Sanjukta Bhowmick, Xin Liang, Sajal K. Das 0001. 115-124 [doi]
- IMpart: A Partitioning-based Parallel Approach to Accelerate Influence MaximizationReet Barik, Marco Minutoli, Mahantesh Halappanavar, Ananth Kalyanaraman. 125-134 [doi]
- Leveraging GPU Tensor Cores for Double Precision Euclidean Distance CalculationsBenoît Gallet, Michael Gowanlock. 135-144 [doi]
- A Portable Sparse Solver Framework for Large Matrices on Heterogeneous ArchitecturesFazlay Rabbi, Christopher S. Daley, Ümit V. Çatalyürek, Hasan Metin Aktulga. 145-155 [doi]
- Performance analysis of GPU accelerated meshfree q-LSKUM solvers in Fortran, C, Python, and JuliaNischay Ram Mamidi, Dhruv Saxena, Kumar Prasun, Anil Nemili, Bharatkumar Sharma, S. M. Deshpande. 156-165 [doi]
- A Deep Learning-Based In Situ Analysis Framework for Tropical Cyclogenesis PredictionAbir Mukherjee, Preeti Malakar. 166-175 [doi]
- HiBGT: High-Performance Bayesian Group Testing for COVID-19Weicong Chen, Curtis Tatsuoka, Xiaoyi Lu. 176-185 [doi]
- Churn Prediction in Telecommunications Industry Based on Conditional Wasserstein GANChang Su 0003, Linglin Wei, Xianzhong Xie. 186-191 [doi]
- A Real-time Flood Inundation Prediction on SX-Aurora TSUBASAYoichi Shimomura, Akihiro Musa, Yoshihiko Sato, Atsuhiko Konja, Guoqing Cui, Rei Aoyagi, Keichi Takahashi, Hiroyuki Takizawa. 192-197 [doi]
- Precise Parallel FEM-based Interactive Cutting Simulation of Deformable BodiesHarshvardhan Das, Suraj Kumar, Subodh Kumar 0001. 198-203 [doi]
- Scaling the SOO Global Blackbox Optimizer on a 128-core ArchitectureDavid Redon, Bilel Derbel, Pierre Fortin 0001. 204-214 [doi]
- A GPU-accelerated Data Transformation Framework Rooted in Pushdown TransducersTri Nguyen, Michela Becchi. 215-225 [doi]
- An Algorithmic and Software Pipeline for Very Large Scale Scientific Data Compression with Error GuaranteesTania Banerjee, Jong Choi 0001, Jaemoon Lee, Qian Gong, Ruonan Wang, Scott Klasky, Anand Rangarajan 0001, Sanjay Ranka. 226-235 [doi]
- COMPROF and COMPLACE: Shared-Memory Communication Profiling and Automated Thread Placement via Dynamic Binary InstrumentationRyan Kirkpatrick, Christopher Brown 0002, Vladimir Janjic. 236-245 [doi]
- LuxIO: Intelligent Resource Provisioning and Auto-Configuration for Storage ServicesKeith Bateman, Neeraj Rajesh, Jaime Cernuda Garcia, Luke Logan, Jie Ye, Stephen Herbein, Anthony Kougkas, Xian-He Sun. 246-255 [doi]
- IRIS-BLAS: Towards a Performance Portable and Heterogeneous BLAS LibraryNarasinga Rao Miniskar, Mohammad Alaul Haque Monil, Pedro Valero-Lara, Frankie Y. Liu, Jeffrey S. Vetter. 256-261 [doi]
- Towards Efficient Cache Allocation for High-Frequency CheckpointingAvinash Maurya, Bogdan Nicolae, M. Mustafa Rafique, Amr M. Elsayed, Thierry Tonellot, Franck Cappello. 262-271 [doi]
- 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence SpeedConglong Li, Ammar Ahmad Awan, Hanlin Tang, Samyam Rajbhandari, Yuxiong He. 272-281 [doi]
- Input Feature Pruning for Accelerating GNN Inference on Heterogeneous PlatformsJason Yik, Sanmukh R. Kuppannagari, Hanqing Zeng, Viktor K. Prasanna. 282-291 [doi]
- Provenance-based Workflow Diagnostics Using Program SpecificationYuta Nakamura, Tanu Malik, Iyad Kanj, Ashish Gehani. 292-301 [doi]
- EECAAP: Efficient Edge-Computing based Anonymous Authentication Protocol for IoVHimani Sikarwar, Debasis Das 0001. 302-307 [doi]