Abstract is missing.
- Microarchitecture of a Configurable High-Radix Router for the Post-Moore EraYi Dai, Kai Lu, Junsheng Chang, Xingyun Qi, Jijun Cao, Jianmin Zhang. 3-17 [doi]
- BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICsMohammadreza Bayatpour, Nick Sarkauskas, Hari Subramoni, Jahanzeb Maqbool Hashmi, Dhabaleswar K. Panda 0001. 18-37 [doi]
- Lessons Learned from Accelerating Quicksilver on Programmable Integrated Unified Memory Architecture (PIUMA) and How That's Different from CPUJesmin Jahan Tithi, Fabrizio Petrini, David F. Richards. 38-56 [doi]
- A Hierarchical Task Scheduler for Heterogeneous ComputingNarasinga Rao Miniskar, Frank Liu, Aaron R. Young, Dwaipayan Chakraborty, Jeffrey S. Vetter. 57-76 [doi]
- Auto-Precision Scaling for Distributed Deep LearningRuobing Han, James Demmel, Yang You. 79-97 [doi]
- FPGA Acceleration of Number Theoretic TransformTian Ye, Yang Yang, Sanmukh R. Kuppannagari, Rajgopal Kannan, Viktor K. Prasanna. 98-117 [doi]
- Designing a ROCm-Aware MPI Library for AMD GPUs: Early ExperiencesKawthar Shafie Khorassani, Jahanzeb Maqbool Hashmi, Ching-Hsiang Chu, Chen-Chun Chen, Hari Subramoni, Dhabaleswar K. Panda 0001. 118-136 [doi]
- A Tunable Implementation of Quality-of-Service Classes for HPC NetworksKevin A. Brown, Neil McGlohon, Sudheer Chunduri, Eric Borch, Robert B. Ross, Christopher D. Carothers, Kevin Harms. 137-156 [doi]
- Scalability of Streaming Anomaly Detection in an Unbounded Key Space Using Migrating ThreadsBrian A. Page, Peter M. Kogge. 157-175 [doi]
- HTA: A Scalable High-Throughput Accelerator for Irregular HPC WorkloadsPouya Fotouhi, Marjan Fariborz, Roberto Proietti, Jason Lowe-Power, Venkatesh Akella, S. J. Ben Yoo. 176-194 [doi]
- Proctor: A Semi-Supervised Performance Anomaly Diagnosis Framework for Production HPC SystemsBurak Aksar, Yijia Zhang 0002, Emre Ates, Benjamin Schwaller, Omar Aaziz, Vitus J. Leung, Jim M. Brandt, Manuel Egele, Ayse K. Coskun. 195-214 [doi]
- COSTA: Communication-Optimal Shuffle and Transpose Algorithm with Process RelabelingMarko Kabic, Simon Pintarelli, Anton Kozhevnikov, Joost VandeVondele. 217-236 [doi]
- Enabling AI-Accelerated Multiscale Modeling of Thrombogenesis at Millisecond and Molecular Resolutions on SupercomputersYicong Zhu, Peng Zhang, Changnian Han, Guojing Cong, Yuefan Deng. 237-254 [doi]
- Evaluation of the NEC Vector Engine for Legacy CFD CodesKeith Obenschain, Yu Yu Khine, Raghunandan Mathur, Gopal Patnaik, Robert Rosenberg. 255-271 [doi]
- Distributed Sparse Block Grids on GPUsPietro Incardona, Tommaso Bianucci, Ivo F. Sbalzarini. 272-290 [doi]
- iPUG: Accelerating Breadth-First Graph Traversals Using Manycore Graphcore IPUsLuk Burchard, Johannes Moe, Daniel Thilo Schroeder, Konstantin Pogorelov, Johannes Langguth. 291-309 [doi]
- Optimizing GPU-Enhanced HPC System and Cloud Procurements for Scientific WorkloadsRichard Todd Evans, Matthew Cawood, Stephen Lien Harrell, Lei Huang, Si Liu, Chun-Yaung Lu, Amit Ruhela, Yinzhi Wang, Zhao Zhang 0007. 313-331 [doi]
- A Performance Analysis of Modern Parallel Programming Models Using a Compute-Bound ApplicationAndrei Poenaru, Wei-Chen Lin, Simon McIntosh-Smith. 332-350 [doi]
- Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise ImpactAyesha Afzal, Georg Hager, Gerhard Wellein. 351-371 [doi]
- Performance of the Supercomputer Fugaku for Breadth-First Search in Graph500 BenchmarkMasahiro Nakao, Koji Ueno, Katsuki Fujisawa, Yuetsu Kodama, Mitsuhisa Sato. 372-390 [doi]
- Under the Hood of SYCL - An Initial Performance Analysis with An Unstructured-Mesh CFD ApplicationIstván Z. Reguly, Andrew M. B. Owenson, Archie Powell, Stephen A. Jarvis, Gihan R. Mudalige. 391-410 [doi]
- Characterizing Containerized HPC Applications Performance at Petascale on CPU and GPU ArchitecturesAmit Ruhela, Stephen Lien Harrell, Richard Todd Evans, Gregory J. Zynda, John M. Fonner, Matt Vaughn, Tommy Minyard, John Cazes. 411-430 [doi]
- Ubiquitous Performance AnalysisDavid Böhme, Pascal Aschwanden, Olga Pearce, Kenneth Weiss, Matthew P. LeGendre. 431-449 [doi]
- Artemis: Automatic Runtime Tuning of Parallel Execution Parameters Using Machine LearningChad Wood, Giorgis Georgakoudis, David Beckingsale, David Poliakoff, Alfredo Giménez, Kevin A. Huck, Allen D. Malony, Todd Gamblin. 453-472 [doi]