Abstract is missing.
- Accelerating MPI All-to-All Communication with Online Compression on Modern GPU ClustersQinghua Zhou, Pouya Kousha, Quentin Anthony, Kawthar Shafie Khorassani, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda 0001. 3-25 [doi]
- NVIDIA's Quantum InfiniBand Network Congestion Control Technology and Its Impact on Application PerformanceYuval Shpigelman, Gilad Shainer, Richard L. Graham, Yong Qin, Gerardo Cisneros-Stoianowski, Craig B. Stunkel. 26-43 [doi]
- LLM: Realizing Low-Latency Memory by Exploiting Embedded Silicon Photonics for Irregular WorkloadsMarjan Fariborz, Mahyar Samani, Pouya Fotouhi, Roberto Proietti, Il-Min Yi, Venkatesh Akella, Jason Lowe-Power, Samuel Palermo, S. J. Ben Yoo. 44-64 [doi]
- SU3_Bench on a Programmable Integrated Unified Memory Architecture (PIUMA) and How that Differs from Standard NUMA CPUsJesmin Jahan Tithi, Fabio Checconi, Douglas Doerfler, Fabrizio Petrini. 65-84 [doi]
- "Hey CAI" - Conversational AI Enabled User Interface for HPC ToolsPouya Kousha, Arpan Jain, Ayyappa Kolli, Prasanna Sainath, Hari Subramoni, Aamir Shafi, Dhabaleswar K. Panda 0001. 87-108 [doi]
- Hy-Fi: Hybrid Five-Dimensional Parallel DNN Training on High-Performance GPU ClustersArpan Jain, Aamir Shafi, Quentin Anthony, Pouya Kousha, Hari Subramoni, Dhabaleswar K. Panda 0001. 109-130 [doi]
- Efficient Application of Hanging-Node Constraints for Matrix-Free High-Order FEM Computations on CPU and GPUPeter Munch, Karl Ljungkvist, Martin Kronbichler 0002. 133-152 [doi]
- Dynamic Task Fusion for a Block-Structured Finite Volume Solver over a Dynamically Adaptive Mesh with Local Time SteppingBaojiu Li, Holger Schulz, Tobias Weinzierl, Han Zhang. 153-173 [doi]
- Accelerating Simulated Quantum Annealing with GPU and Tensor CoresYi-Hua Chung, Cheng-Jhih Shih, Shih-Hao Hung. 174-191 [doi]
- m-Cubes: An Efficient and Portable Implementation of Multi-dimensional Integration for GPUsIoannis Sakiotis, Kamesh Arumugam, Marc Paterno, Desh Ranjan, Balsa Terzic, Mohammad Zubair. 192-209 [doi]
- Comparative Evaluation of Call Graph Generation by Profiling ToolsOnur Cankur, Abhinav Bhatele. 213-232 [doi]
- MAPredict: Static Analysis Driven Memory Access Prediction Framework for Modern CPUsMohammad Alaul Haque Monil, Seyong Lee, Jeffrey S. Vetter, Allen D. Malony. 233-255 [doi]
- Rapid Execution Time Estimation for Heterogeneous Memory Systems Through Differential TracingNicolas Denoyelle, Swann Perarnau, Kamil Iskra, Balazs Gerofi. 256-274 [doi]
- Understanding Distributed Deep Learning Performance by Correlating HPC and Machine Learning MeasurementsAna Luisa Veroneze Solórzano, Lucas Mello Schnorr. 275-292 [doi]
- A Motivating Case Study on Code Variant Selection by Reinforcement LearningOliver Hacker, Matthias Korch, Johannes Seiferth. 293-312 [doi]
- Remote OpenMP OffloadingAtmn Patel, Johannes Doerfert. 315-333 [doi]
- Hybrid Parallel ILU Preconditioner in Linear Solver Library GaspiLSRaju Ram, Daniel Grünewald, Nicolas R. Gauger. 334-353 [doi]
- A Subset of the CERN Virtual Machine File System: Fast Delivering of Complex Software Stacks for Supercomputing ResourcesAlexandre F. Boyer, Christophe Haen, Federico Stagni, David R. C. Hill. 354-371 [doi]