Abstract is missing.
- Ab-initio Quantum Transport with the GW Approximation, 42, 240 Atoms, and Sustained Exascale PerformanceNicolas Vetsch, Alexander Maeder, Vincent Maillou, Anders Winka, Jiang Cao, Grzegorz Kwasniewski, Leonard Deuschle, Torsten Hoefler, Alexandros Nikolaos Ziogas, Mathieu Luisier. 1-13 [doi]
- Simulating many-engine spacecraft: Exceeding 1 quadrillion degrees of freedom via information geometric regularizationBenjamin Wilfong, Anand Radhakrishnan, Henry Le Berre, Daniel Vickers, Tanush Prathi, Nikolaos Tselepidis, Benedikt Dorschner, Reuben D. Budiardja, Brian Cornille, Stephen Abbott, Florian Schäfer 0001, Spencer H. Bryngelson. 14-24 [doi]
- Cosmological Hydrodynamics at Exascale: A Trillion-Particle Leap in CapabilityNicholas Frontiere, J. D. Emberson, Michael Buehlmann, Esteban M. Rangel, Salman Habib 0002, Katrin Heitmann, Patricia Larsen, Vitali A. Morozov, Adrian Pope, Claude-André Faucher-Giguère, Antigoni Georgiadou, Damien Lebrun-Grandié, Andrey Prokopenko. 25-35 [doi]
- Multiscale Light-Matter Dynamics in Quantum Materials: From Electrons to Topological SuperlatticesTaufeq Mohammed Razakh, Thomas Linker, Ye Luo, Nariman Piroozan, Simon John Pennycook, Nalini Kumar, Albert Musaelian, Anders Johansson, Boris Kozinsky, Rajiv K. Kalia, Priya Vashishta, Fuyuki Shimojo, Shinnosuke Hattori, Ken-ichi Nomura, Aiichiro Nakano. 36-47 [doi]
- Advancing Quantum Many-Body GW Calculations on Exascale Supercomputing PlatformsBenran Zhang, Daniel Weinberg, Chih-En Hsu, Aaron R. Altman, Yuming Shi, James B. White, Derek Vigil-Fowler, Steven G. Louie, Jack R. Deslippe, Felipe H. da Jornada, Zhenglu Li, Mauro Del Ben. 48-59 [doi]
- Real-Time Bayesian Inference at Extreme Scale: A Digital Twin for Tsunami Early Warning Applied to the Cascadia Subduction ZoneStefan Henneking, Sreeram Venkat, Veselin Dobrev, John Camier, Tzanio V. Kolev, Milinda Fernando, Alice-Agnes Gabriel, Omar Ghattas. 60-71 [doi]
- AERIS: Argonne Earth Systems Model for Reliable and Skillful PredictionsVäinö Hatanpää, Eugene Ku, Jason Stock, Murali Emani, Sam Foreman, Chunyong Jung, Sandeep Madireddy, Tung Nguyen, Varuni Sastry, Ray A. O. Sinurat, Huihuo Zheng, Sam Wheeler, Troy Arcomano, Venkatram Vishwanath, Rao Kotamarthi. 72-85 [doi]
- ORBIT-2: Scaling Exascale Vision Foundation Models for Weather and Climate DownscalingXiao Wang 0004, Jong Youl Choi, Takuya Kurihaya, Isaac Lyngaas, Hong-Jun Yoon, Xi Xiao, David Pugmire, Ming Fan, Nasik Muhammad Nafi, Aristeidis Tsaris, Ashwin M. Aji, Maliha Hossain, Mohamed Wahib, Dali Wang, Peter E. Thornton, Prasanna Balaprakash, Moetasim Ashfaq, Dan Lu 0001. 86-98 [doi]
- Destination Earth: The Climate Change Adaptation Digital TwinIoan Hadade, Daniel Klocke, Jussi Enkovaara, Tuomas Lunttila, Thomas Rackow, Jan Frederik Engels, Claudia Frauen, René Redler, Jenni Kontkanen, Thomas Jung, Dmitry Sein, Irina Sandu, Balthasar Reuter, Nils Wedi, Sebastian Milinski, Francisco Doblas-Reyes, Miguel Castrillo, Mario C. Acosta, Sergi Girona, Pekka Manninen. 99-110 [doi]
- Kilometer-Scale AI-Powered and Performance-Portable Earth System Model (AP3ESM) to Achieve Year-Scale Simulation Speed on Heterogeneous SupercomputersKai Xu, Maoxue Yu, Yuhu Chen, Jie Gao, Shuang Wang, Jiaying Song, Xiaohui Duan, Junwei Wei, Jiangfeng Yu, Hailong Liu, Jinrong Jiang, Yi Zhang 0127, Pengfei Lin 0004, Tianyi Wang, Pengfei Wang, Weipeng Zheng, Jingwei Xie, Jiakang Zhang, Zilu Liu, Xiaoyu Jin, Jilin Wei, Qixin Chang, Qingxia Lin, Yanzhi Zhou, Weiguo Liu, Wei Xue 0003, Yiwen Li, Haohuan Fu, Yue Yu 0001, Xuebin Chi, Lixin Wu. 111-124 [doi]
- Computing the Full Earth System at 1km ResolutionDaniel Klocke, Claudia Frauen, Jan Frederik Engels, Dmitry Alexeev, René Redler, Reiner Schnur, Helmuth Haak, Luis Kornblueh, Nils Brüggemann, Fatemeh Chegini, Manoel Römmer, Lars Hoffmann, Sabine Griessbach, Mathis Bode, Jonathan Coles, Miguel Gila, William Sawyer, Alexandru Calotoiu, Yakup Budanaz, Pratyai Mazumder, Marcin Copik, Benjamin Weber, Andreas Herten, Hendryk Bockelmann, Torsten Hoefler, Cathy Hohenegger, Bjorn Stevens. 125-136 [doi]
- PerfDojo: Automated ML Library Generation for Heterogeneous ArchitecturesAndrei Ivanov, Siyuan Shen, Gioele Gottardo, Marcin Chrapek, Afif Boudaoud, Timo Schneider, Luca Benini, Torsten Hoefler. 137-151 [doi]
- Automatic Generation of Mappings for Distributed Fourier OperationsDoru-Thom Popovici, Botao Wu, John Shalf, Martin Kong. 152-166 [doi]
- A Sample-Free Compilation Framework for Efficient Dynamic Tensor ComputationYangjie Zhou 0001, Honglin Zhu, Qian Qiu, Weihao Cui, Zihan Liu 0002, Peng Chen 0035, Mohamed Wahib, Cong Guo 0003, Siyuan Feng, Jintao Meng 0001, Haidong Lan, Jingwen Leng, Yun Lin 0001, Jin Song Dong 0001, Wenxi Zhu, Minwen Deng. 167-184 [doi]
- Constraint-Driven Auto-Tuning of GEMM-like Operators for MT-3000 Many-core ProcessorXinxin Qi, Jianbin Fang, Peng Zhang 0061, Yonggang Che, Jie Ren 0007. 185-199 [doi]
- Plexus: Taming Billion-edge Graphs with 3D Parallel Full-graph GNN TrainingAditya K. Ranjan, Siddharth Singh, Cunyang Wei, Abhinav Bhatele. 200-216 [doi]
- PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed TrainingSeth Ockerman, Amal Gueroudji, Tanwi Mallick, Yixuan He, Line Pouchard, Robert B. Ross, Shivaram Venkataraman. 217-236 [doi]
- TaGNN: An Efficient Topology-aware Accelerator for High-performance Dynamic Graph Neural NetworkHui Yu, Yu Zhang 0027, Ligang He, Bing Peng, Jin Zhao 0003, Zixiao Wang 0005, Hao Qi 0004, Hai Jin 0001. 237-249 [doi]
- Moment: Co-optimizing Physical Communication Topology and Data Placement for Multi-GPU Out-of-core GNN TrainingZuocheng Shi, Jie Sun 0017, Ziyu Song, Mo Sun 0001, Yang Xiao, Fei Wu 0001, Zeke Wang. 250-264 [doi]
- mLR: Scalable Laminography Reconstruction based on MemoizationBin Ma, Viktor Nikitin, Xi Wang, Tekin Bicer, Dong Li 0001. 265-280 [doi]
- Scaling the memory wall using mixed-precision - HPG-MxP on an exascale machineAditya Kashi, Nicholson Koukpaizan, Hao Lu 0001, Michael A. Matheson, Sarp Oral, Feiyi Wang. 281-297 [doi]
- CPU- and GPU-initiated Communication Strategies for Conjugate Gradient Methods on Large GPU ClustersJames D. Trotter, Sinan Ekmekçibasi, Dogan Sagbili, Johannes Langguth, Xing Cai, Didem Unat. 298-315 [doi]
- Zero-Value Code Specialization via Profile-Guided Control Data Flow AnalysisShaokang Du, Kelun Lei, Xin You, Hailong Yang 0002, Yufan Xu 0001, Zhongzhi Luan, Yi Liu 0013, Depei Qian. 316-330 [doi]
- C.A.T.S.: Memory and Control Flow Tracing for Whole-Program Performance AnalysisPhilipp Schaad, Tal Ben-Nun, Torsten Hoefler. 331-348 [doi]
- ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed StorageSiyuan Shen, Tommaso Bonato, Zhiyi Hu, Pasquale Jordan, Tiancheng Chen, Torsten Hoefler. 349-367 [doi]
- RedSan: A Redundant Memory Instruction Sanitizer for GPU ProgramsYanBo Zhao, Yueming Hao, Zecheng Li 0001, Shuyin Jiao, Xu Liu 0001, Jiajia Li 0001. 368-382 [doi]
- TraceFlow: Efficient Trace Analysis for Large-Scale Parallel Applications via Interaction Pattern-Aware Trace DistributionYuyang Jin 0001, Xirui Shui, Mingshu Zhai, Zan Zong, Feng Zhang 0007, Felix Wolf 0001, Jidong Zhai. 383-396 [doi]
- Insights from Optimizing HPL Performance on Exascale Systems: A Comparative Analysis of Panel FactorizationHao Lu 0001, Michael A. Matheson, Noel Chalmers, Aditya Kashi, Nicholas Malaya, Feiyi Wang. 397-410 [doi]
- Breaking the System Noise Barrier at ExascaleEdgar A. León, Joseph Glenski, Mark Stock, Kim H. McMahon, William Loewe, Clark Snyder, Larry Kaplan, Srinath Vadlamani, Timothy I. Mattox, Trent D'Hooge, Brian Behlendorf, Nathan Hanford, Ramesh Pankajakshan, Matthew L. Leininger. 411-436 [doi]
- Addressing Reproducibility Challenges in HPC with Continuous IntegrationValérie Hayot-Sasson, Nathaniel Hudson 0001, André Bauer 0001, Maxime Gonthier, Ian T. Foster, Kyle Chard. 437-457 [doi]
- ChatHPC: Building the Foundations for a Productive and Trustworthy AI-Assisted HPC EcosystemPedro Valero-Lara, Aaron R. Young, Jeffrey S. Vetter, Zheming Jin, Swaroop Pophale, Mohammad Alaul Haque Monil, Keita Teranishi, William F. Godoy. 458-474 [doi]
- GreenMix: Energy-Efficient Serverless Computing via Randomized Sketching on Asymmetric Multi-CoresRohan Basu Roy, Tirthak Patel, Baolin Li 0001, Siddharth Samsi, Vijay Gadepally, Devesh Tiwari. 475-489 [doi]
- HELM: Characterizing Unified Memory Accesses to Improve GPU Performance under Memory OversubscriptionNathan Jones, Tyler N. Allen, Rong Ge. 490-504 [doi]
- Minimizing Power Waste in Heterogenous Computing via Adaptive Uncore ScalingZhong Zheng, Seyfal Sultanov, Michael E. Papka, Zhiling Lan. 505-518 [doi]
- BOER: Enhancing Resource Utilization for Deep Learning Inference with Hybrid Spatial GPU SharingBowen Zhang, Yuhang Wang, Zhuozhao Li. 519-532 [doi]
- XaaS Containers: Performance-Portable Representation With Source and IR ContainersMarcin Copik, Eiman AlNuaimi, Alok Kamatar, Valérie Hayot-Sasson, Alberto Madonna, Todd Gamblin, Kyle Chard, Ian T. Foster, Torsten Hoefler. 533-555 [doi]
- Bridging the Gap Between Binary and Source Based Package Management in SpackJohn Gouwar, Gregory Becker, Tamara Dahlgren, Nathan Hanford, Arjun Guha, Todd Gamblin. 556-569 [doi]
- EDDE: Container Deployment Framework Beyond the CloudHao Fan 0006, Zhuo Huang, Shadi Ibrahim, Lin Gu 0002, Song Wu 0001. 570-585 [doi]
- coMtainer: Compilation-assisted HPC Container Images with Enhanced AdaptabilityYuhao Gu, Haoquan Chen, Xianjie Chen, Jiangsu Du, Zhiguang Chen 0001, Nong Xiao 0001, Xianwei Zhang 0001, Yutong Lu. 586-601 [doi]
- Sparsified Preconditioned Conjugate Gradient Solver on GPUsDa Ma, Khalid Ahmad, Kazem Cheshmi, Hari Sundar, Mary W. Hall. 602-616 [doi]
- FaSTCC: Fast Sparse Tensor Contractions on CPUsSaurabh Raje, Hunter McCoy, Atanas Rountev, Prashant Pandey 0001, P. Sadayappan. 617-630 [doi]
- StraGCN: GPU-Accelerated Strassen's Sparse-Dense Matrix Multiplication for Graph Convolutional Network TrainingWeidong He, Haikun Liu, Zhuohui Duan, Xiaofei Liao, Shuhao Zhang 0001, Fubing Mao, Hai Jin 0001. 631-644 [doi]
- Bridging the Gap between Unstructured SpMM and Structured Sparse Tensor CoresYukang Dong, Ziyuan Shen, Wenbin Jiang 0001, Zhenghang Liu, Ye Xu, Bingyi He, Ran Zheng, Hai Jin 0001. 645-660 [doi]
- RAPTOR: Practical Numerical Profiling of Scientific ApplicationsFaveo Hoerold, Ivan R. Ivanov, Akash Dhruv, William S. Moses, Anshu Dubey, Mohamed Wahib, Jens Domke. 661-680 [doi]
- Numerical Performance of the Implicitly Restarted Arnoldi Method in OFP8, Bfloat16, Posit, and Takum ArithmeticsLaslo Hunhold, James Quinlan, Stefan Wesner. 681-694 [doi]
- High-Performance Branch-Free Algorithms for Extended-Precision Floating-Point ArithmeticDavid Kai Zhang, Alex Aiken. 695-710 [doi]
- A Nested Krylov Method Using Half-Precision ArithmeticKengo Suzuki, Takeshi Iwashita. 711-727 [doi]
- Qonductor: A Cloud Orchestrator for Quantum ComputingEmmanouil Giortamis, Francisco Romão, Nathaniel Tornow, Dmitry Lugovoy, Pramod Bhatotia. 728-745 [doi]
- QDockBank: A dataset for Ligand Docking on Protein Fragments Predicted on Utility-Level Quantum ComputersYuqi Zhang, Yuxin Yang, Cheng-Chang Lu, Weiwen Jiang, Feixiong Cheng, Bo Fang 0002, Qiang Guan. 746-761 [doi]
- Augmenting Simulated Noisy Quantum Data Collection by Orders of Magnitude Using Pre-Trajectory Sampling with Batched ExecutionTaylor Lee Patti, Thien Nguyen 0001, Justin Gage Lietz, Alex McCaskey, Brucek Khailany. 762-773 [doi]
- Optimizing Quantum Circuit Mapping to Reduce Inter-Module Communications in Distributed ArchitecturesLongshan Xu, Edwin Hsing-Mean Sha, Xiulin Cui, Qingfeng Zhuge. 774-788 [doi]
- UpANNS: Enhancing Billion-Scale ANNS Efficiency with Real-World PIM ArchitectureSitian Chen, Amelie Chi Zhou, Yucheng Shi, Yusen Li, Xin Yao. 789-804 [doi]
- MetoHash: A Memory-Efficient and Traffic-Optimized Hashing Index on Hybrid PMem-DRAM MemoriesZixiang Yu, Guangyang Deng, Zhirong Shen, Qiangsheng Su, Ronglong Wu, Xiaoli Wang 0002, Quanqing Xu, Chuanhui Yang, Zhifeng Bao. 805-819 [doi]
- DRIM-ANN: An Approximate Nearest Neighbor Search Engine based on Commercial DRAM-PIMsMingkai Chen, Tianhua Han, Cheng Liu 0008, Shengwen Liang, Kuai Yu, Lei Dai, Ziming Yuan, Ying Wang 0001, Lei Zhang 0008, Huawei Li 0001, Xiaowei Li 0001. 820-836 [doi]
- Optimizing Data Acquisitions in Multi-Robot SystemsYanhao Li, Zijun Xu, Xuanjun Wen, Yanjie Song, Guancheng Li, Shu Yin 0001. 837-854 [doi]
- ThirstyFLOPS: Water Footprint Modeling and Analysis Toward Sustainable HPC SystemsYankai Jiang 0002, Raghavendra Kanakagiri, Rohan Basu Roy, Devesh Tiwari. 855-869 [doi]
- Core Hours and Carbon Credits: Incentivizing Sustainability in HPCAlok Kamatar, Maxime Gonthier, Valérie Hayot-Sasson, André Bauer 0001, Marcin Copik, Raul Castro Fernandez, Torsten Hoefler, Kyle Chard, Ian T. Foster. 870-887 [doi]
- Benchmark-driven Models for Energy Analysis and Attribution of GPU-Accelerated SupercomputingOscar Antepara, Zhengji Zhao, Brian Austin, Nan Ding 0006, Leonid Oliker, Nicholas J. Wright, Samuel Williams 0001. 888-904 [doi]
- Characterizing Performance, Power, and Energy of AMD CDNA3 GPU FamilyBagus Hanindhito, Bhavesh Patel. 905-934 [doi]
- Distributed Cross-Channel Hierarchical Aggregation for Foundation ModelsAristeidis Tsaris, Isaac Lyngaas, John H. Lagergren, Mohamed Wahib, Larry M. York, Prasanna Balaprakash, Dan Lu 0001, Feiyi Wang, Xiao Wang 0004. 935-948 [doi]
- Accelerated Spatio-Temporal Bayesian Modeling for Multivariate Gaussian ProcessesLisa Gaedke-Merzhäuser, Vincent Maillou, Fernando Rodriguez Avellaneda, Olaf Schenk, Paula Moraga, Mathieu Luisier, Alexandros Nikolaos Ziogas, Håvard Rue. 949-972 [doi]
- Towards Efficient LLM Inference via Collective and Adaptive Speculative DecodingSiqi Wang, Hailong Yang 0002, Xuezhu Wang, Tongxuan Liu, Pengbo Wang, Yufan Xu 0001, Xuning Liang, Kejie Ma, Tianyu Feng, Xin You, Ruihao Gong, Rui Wang 0014, Zhongzhi Luan, Yi Liu 0013, Depei Qian. 973-990 [doi]
- TurboFNO: High-Performance Fourier Neural Operator with Fused FFT-GEMM-iFFT on GPUShixun Wu, Yujia Zhai, Huangliang Dai, Yue Zhu, HaiYang Hu, Zizhong Chen. 991-1005 [doi]
- ODOS-MPI: HPC-Friendly SmartNIC Offloading of Computation/Communication KernelsMuhammad Usman, Mariano Benito, Sergio Iserte, Antonio J. Peña. 1006-1027 [doi]
- AGILE: Lightweight and Efficient Asynchronous GPU-SSD IntegrationZhuoping Yang, Jinming Zhuang, Xingzhen Chen, Alex K. Jones, Peipei Zhou 0001. 1028-1042 [doi]
- LCI: a Lightweight Communication Interface for Efficient Asynchronous Multithreaded CommunicationJiakun Yan, Marc Snir. 1043-1059 [doi]
- COSMOS: Performance Portable Graph Pattern Matching with Domain-Specific Software Distributed Shared MemoryZhiheng Lin, Ke Meng, Changjie Xu, Weichen Cao 0002, Guangming Tan. 1060-1072 [doi]
- Fine-grained Automated Failure Management for Extreme-Scale GPU Accelerated SystemsYonatan Levitt, Richard Barella, Sam Zeltner, Thomas Musta, Lance Cheney, Gustavo Espinosa, Olivier Franza, Balazs Gerofi. 1073-1084 [doi]
- FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant AttentionHuangliang Dai, Shixun Wu, Jiajun Huang 0001, Zizhe Jian, Yue Zhu, HaiYang Hu, Zizhong Chen. 1085-1098 [doi]
- Deploying Lightweight Input-Aware Selective Instruction Duplication in HPC ApplicationsMd Hasanur Rahman 0001, Guanpeng Li. 1099-1112 [doi]
- LowDiff: Efficient Frequent Checkpointing via Low-Cost Differential for High-Performance Distributed Training SystemsChenxuan Yao, Feifan Liu, Yuchong Hu, Zhengyu Liu, Xinjue Zheng, Wenxiang Zhou. 1113-1126 [doi]
- Demystifying the Resilience of Large Language Model Inference: An End-to-End PerspectiveYu Sun, Zachary Coalson, Shiyang Chen 0004, Hang Liu 0001, Zhao Zhang 0007, Sanghyun Hong 0001, Bo Fang 0002, Lishan Yang. 1127-1144 [doi]
- Story of Two GPUs: Characterizing the Resilience of Hopper H100 and Ampere A100 GPUsShengkun Cui, Archit Patke, Hung Nguyen, Aditya Ranjan, Ziheng Chen 0006, Phuong Cao, Gregory H. Bauer, Brett M. Bode, Catello Di Martino, Saurabh Jha, Chandra Narayanaswami, Daby Sow, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer. 1145-1164 [doi]
- Exploring and Mitigating Failure Behavior of Large Language Model Training Workloads in HPC SystemsPengfei Yu 0002, Jingjing Gu, Hao Han, Dazhong Shen, Bao Wen, Yang Liu 0390. 1165-1179 [doi]
- Effective Node-Level Anomaly Detection in HPC Systems via Coarse-Grained Clustering and Fine-Grained Model SharingSibo Xia, Yongqian Sun, Xijie Pan, Yuan Yuan 0034, Shenglin Zhang, Shaoyu Hu, Lei Tao, Yuqi Li, Jinghua Feng. 1180-1194 [doi]
- Uno: A One-Stop Solution for Inter- and Intra-Data Center Congestion Control and Reliable ConnectivityTommaso Bonato, Sepehr Abdous, Abdul Kabbani, Ahmad Ghalayini, Nadeen Gebara, Terry Lam, Anup Agarwal, Tiancheng Chen, Zhuolong Yu, Konstantin Taranov, Mahmoud Elhaddad, Daniele De Sensi, Soudeh Ghorbani, Torsten Hoefler. 1195-1210 [doi]
- ACTINA: Adapting Circuit-Switching Techniques for AI Networking ArchitecturesZhenguo Wu, Benjamin Klenk, Larry Dennison, Keren Bergman. 1211-1222 [doi]
- SDR-RDMA: Software-Defined Reliability Architecture for Planetary Scale RDMA CommunicationMikhail Khalilov, Siyuan Shen, Marcin Chrapek, Tiancheng Chen, Kenji Nakano, Nicola Mazzoletti, Peter-Jan Gootzen, Salvatore Di Girolamo, Rami Nudelman, Gil Bloch, Jithin Jose, Abdul Kabbani, Sreevatsa Anantharamu, Jie Zhang, Konstantin Taranov, Zhuolong Yu, Scott Moe, Mahmoud Elhaddad, Torsten Hoefler. 1223-1239 [doi]
- Scaling Out Chip Interconnect Networks with Implicit Sequence NumbersGiyong Jung, Saeid Gorgin 0001, John Kim, Jungrae Kim. 1240-1251 [doi]
- STELLAR: Storage Tuning Engine Leveraging LLM Autonomous Reasoning for High Performance Parallel File SystemsChris Egersdoerfer, Philip H. Carns, Shane Snyder, Robert Ross, Dong Dai 0001. 1252-1266 [doi]
- Phoenix: A Refactored I/O Stack for GPU Direct Storage without Phony BuffersJianqin Yan, Shi Qiu, Yina Lv, Yifan Hu, Hao Chen, Zhirong Shen, Xin Yao, Renhai Chen, Jiwu Shu, Gong Zhang 0001, Yiming Zhang 0003. 1267-1283 [doi]
- gParaKV: A GPGPU-accelerated Key-Value Separation-based KV Store with Optimized Compaction and Garbage CollectionHui Sun 0002, Xiangxiang Jiang, Xiao Qin 0001, Song Jiang 0001, Enhui Wang. 1284-1298 [doi]
- MANS: Efficient and Portable ANS Encoding for Multi-Byte Integer Data on CPUs and GPUsWenjing Huang, Jinwu Yang, Shengquan Yin, Haoxu Li, Yida Gu, Zedong Liu, Xing Jing, Zheng Wei, Shiyuan Fu, Hao Hu, Guangming Tan, Dingwen Tao. 1299-1314 [doi]
- X-MoE: Enabling Scalable Training for Emerging Mixture-of-Experts Architectures on HPC PlatformsYueming Yuan, Ahan Gupta, Jianping Li, Sajal Dash, Feiyi Wang, Minjia Zhang. 1315-1331 [doi]
- TT-LoRA MoE: Using Parameter-Efficient Fine-Tuning and Sparse Mixture-Of-ExpertsPradip Kunwar, Minh N. Vu, Maanak Gupta, Mahmoud Abdelsalam, Manish Bhattarai. 1332-1350 [doi]
- Balanced and Elastic End-to-end Training of Dynamic LLMsMohamed Wahib, Muhammed Abdullah Soyturk, Didem Unat. 1351-1367 [doi]
- HPC-R1: Characterizing R1-like Large Reasoning Models on HPCAdam Weingram, Duo Zhang, Zhonghao Chen, Hao Qi, Xiaoyi Lu. 1368-1380 [doi]
- MLP-Offload: Multi-Level, Multi-Path Offloading for LLM Pre-training to Break the GPU Memory WallAvinash Maurya, M. Mustafa Rafique, Franck Cappello, Bogdan Nicolae. 1381-1394 [doi]
- RingX: Scalable Parallel Attention for Long-Context Learning on HPCJunqi Yin, Mijanur Palash, Mallikarjun Shankar, Feiyi Wang. 1395-1408 [doi]
- SlimPipe: Memory-Thrifty and Efficient Pipeline Parallelism for Long-Context LLM TrainingZhouyang Li, Yuliang Liu, Wei Zhang, Tailing Yuan, Bin Chen, Chengru Song. 1409-1428 [doi]
- BurstEngine: An efficient distributed framework for training transformers On extremely Long sequences of over 1M tokensAo Sun, Weilin Zhao, Xu Han 0007, Cheng Yang 0002, Zhiyuan Liu 0001, Chuan Shi 0001, Maosong Sun 0001. 1429-1445 [doi]
- Improving SpGEMM Performance Through Matrix-Reordering and Cluster-wise ComputationAbdullah Al Raqibul Islam, Helen Xu 0001, Dong Dai 0001, Aydin Buluç. 1446-1463 [doi]
- Utilizing Sparsity in the GPU-accelerated Assembly of Schur Complement Matrices in Domain Decomposition MethodsJakub Homola, Ondrej Meca, Lubomír Ríha, Tomás Brzobohatý. 1464-1476 [doi]
- Caracal: A GPU-Resident Sparse LU Solver with Lightweight Fine-Grained SchedulingJie Ren, Tingxuan Zhong, Yuxi Hong, Guofeng Feng, Xincheng Wang, Weile Jia, Hatem Ltaief, David Elliot Keyes. 1477-1494 [doi]
- SparStencil: Retargeting Sparse Tensor Cores to Scientific Stencil Computations via Structured Sparsity TransformationQi Li, Kun Li 0016, Haozhi Han, Liang Yuan, Yunquan Zhang, Yifeng Chen, Junshi Chen, Hong An, Ting Cao 0003, Mao Yang 0004. 1495-1509 [doi]
- Fringe-SGC: Counting Subgraphs with Fringe VerticesCameron Bradley, Ghadeer Ahmed H. Alabandi, Martin Burtscher. 1510-1523 [doi]
- SIGMo: High-Throughput Batched Subgraph Isomorphism on GPUs for Molecular MatchingAntonio De Caro, Gennaro Cordasco, Federico Ficarelli, Biagio Cosenza. 1524-1538 [doi]
- Graphago: Accelerating SSD-based Graph Processing via Activity-Aware Graph PreprocessingXianghao Xu, Yucheng Zhang, Gongxuan Zhang, Yongli Cheng, Fang Wang 0001. 1539-1552 [doi]
- Bubble: Towards Scalable Evolving Graph Processing via Mini-Batch SortingLong Deng, Yongkun Li 0001, Zaigui Zhang, Yinlong Xu, John C. S. Lui. 1553-1571 [doi]
- KAMI: Communication-Avoiding General Matrix Multiplication within a Single GPUHemeng Wang, Yang Du, Sidu Li, Xiaowen Tian, Qingxiao Sun, Weifeng Liu. 1572-1589 [doi]
- MXBLAS: Accelerating 8-bit Deep Learning with a Unified Micro-Scaled GEMM LibraryWeihu Wang, Yaqi Xia, Donglin Yang, Xiaobo Zhou 0002, Dazhao Cheng. 1590-1603 [doi]
- HyTiS: Hybrid Tile Scheduling for GPU GEMM with Enhanced Wave Utilization and Cache LocalityZheng Zhang 0036, Hulin Wang, Hongming Xu, Donglin Yang, Xiaobo Zhou 0002, Dazhao Cheng. 1604-1618 [doi]
- LiquidGEMM: Hardware-Efficient W4A8 GEMM Kernel for High-Performance LLM ServingHuanqi Hu, Bowen Xiao, Shixuan Sun, Jianian Yin, Zhexi Zhang, Xiang Luo, Chengquan Jiang, Weiqi Xu, Xiaoying Jia 0005, Xin Liu 0086, Minyi Guo. 1619-1630 [doi]
- TENSORMD: Accelerating Molecular Dynamics with a High-Performance Machine Learning Interatomic PotentialYucheng Ouyang, Xin Chen 0023, Ying Liu 0055, Xin Chen 0023, Honghui Shang, Zhenchuan Chen, Rongfen Lin, Xingyu Gao 0003, Lifang Wang, Fang Li, Jiahao Shan, Haifeng Song 0003, Huimin Cui, Xiaobing Feng 0002, Jingling Xue. 1631-1645 [doi]
- NNQS-SCI: Tackling Trillion-Dimensional Hilbert Space with Adaptive Neural Network Quantum StatesBowen Kan, Yumeng Zhou, Daiyou Xie, Pengyu Zhou, Yunquan Zhang, Honghui Shang. 1646-1660 [doi]
- MISA-AKMC : Achieve Kinetic Monte Carlo Simulation of 20 Quadrillion Atoms on GPU ClustersShunde Li, Zhijie Pan, Ningming Nie, Jue Wang 0013, He Bai 0005, Genshen Chu, Yan Zeng, Xinfu He, Yangang Wang, Changjun Hu, Xuebin Chi. 1661-1675 [doi]
- MaverIQ: Fingerprint-Guided Extrapolation and Fragmentation-Aware Layering for Intent-Based LLM ServingDimitrios Liakopoulos, Prasoon Sinha, Tianrui Hu, Myungjin Lee, Neeraja J. Yadwadkar. 1676-1696 [doi]
- Compile-Time QoS Scheme for Deep Learning InferencesSungin Hong, Hyunjun Kim, Hwansoo Han. 1697-1709 [doi]
- Hetis: Serving LLMs in Heterogeneous GPU Clusters with Fine-grained and Dynamic ParallelismZizhao Mo, Jianxiong Liao, Huanle Xu, Zhi Zhou, Chengzhong Xu 0001. 1710-1724 [doi]
- gLLM: Global Balanced Pipeline Parallelism Systems for Distributed LLMs Serving with Token ThrottlingTianyu Guo 0009, Xianwei Zhang 0001, Jiangsu Du, Zhiguang Chen 0001, Nong Xiao 0001, Yutong Lu. 1725-1741 [doi]
- SIREN: Software Identification and Recognition in HPC SystemsThomas Jakobsche, Fredrik Robertsén, Jessica R. Jones, Utz-Uwe Haus, Florina M. Ciorba. 1742-1754 [doi]
- Hypertron: Efficiently Scaling Large Models by Exploring High-Dimensional Parallelization SpaceShigang Li 0002, Jingkun Dong, Jihao Chen, Zhi Ma, Zhongzhe Hu. 1755-1768 [doi]
- UltraAttn: Efficiently Parallelizing Attention through Hierarchical Context-TilingHaoyu Yang, Zan Zong, Yuyang Jin 0001, Kinman Lei, Jiaao He, Qigang Yang, Jidong Zhai. 1769-1784 [doi]
- TianheEngine: Hierarchy-aware Adaptive Partitioning System for Trillion-scale Graph ProcessingXinbiao Gan, Tiejun Li, Yiqi Wang 0001, Qiang Zhang 0053, Yongming Yi, Chunye Gong, Jie Liu 0002, Kai Lu 0001. 1785-1799 [doi]
- Parallel Rank-Adaptive Higher Order Orthogonal IterationJoao Pinheiro, Aditya Devarakonda, Grey Ballard. 1800-1815 [doi]
- HStencil: Matrix-Vector Stencil Computation with Interleaved Outer Product and MLAHan Huang, Jiabin Xie, Guangnan Feng, Xianwei Zhang 0001, Dan Huang 0001, Zhiguang Chen 0001, Yutong Lu. 1816-1829 [doi]
- Rethinking Back Transformation in 2-stage Eigenvalue Decomposition on Heterogeneous ArchitecturesHansheng Wang, Dajun Huang, Gaoyuan Zou, Lu Shi, Xu Jiang, Xi Wu, Hancong Duan, Shaoshuai Zhang. 1830-1844 [doi]
- DAS-ILU: A Distributed Asynchronous Parallel ILU Factorization Based on Domain DecompositionFan Yuan, Shengguo Li, Xiaojian Yang, Yunqing Huang, Hongxia Wang, Chuanfu Xu, Dezun Dong, Tiejun Li, Jianchun Wang, Jie Liu 0002. 1845-1858 [doi]
- The First Star-by-star $N$-body/Hydrodynamics Simulation of Our Galaxy Coupling with a Surrogate ModelKeiya Hirashima, Michiko S. Fujii, Takayuki R. Saitoh, Naoto Harada, Kentaro Nomura, Kohji Yoshikawa, Yutaka Hirai, Tetsuro Asano, Kana Moriwaki, Masaki Iwasawa, Takashi Okamoto, Junichiro Makino. 1859-1873 [doi]
- Million-Atom Ab Initio Electron Dynamics: Discontinuous Galerkin Real-Time Time-Dependent Density Functional TheoryJunwei Feng, Junshi Chen, Xiangyu Zhang, Junhui Liu, Xinming Qin, Lingyun Wan, Sheng Chen, Wentiao Wu, Bingkun Hou, Yexuan Lin, Yihong Zhang, Zechuan Zhang, Yijun Hu, Weile Jia, Hong An, Jinlong Yang 0003, Wei Hu 0006. 1874-1887 [doi]
- Deep Learning-Enabled Supercritical Flame Simulation at Detailed Chemistry and Real-Fluid Accuracy Towards Trillion-Cell ScaleZhuoqiang Guo, Runze Mao, Lijun Liu, Guangming Tan, Weile Jia, Zhi X. Chen. 1888-1900 [doi]
- Bine Trees: Enhancing Collective Operations by Optimizing Communication LocalityDaniele De Sensi, Saverio Pasqualoni, Lorenzo Piarulli, Tommaso Bonato, Seydou Ba, Matteo Turisini, Jens Domke, Torsten Hoefler. 1901-1916 [doi]
- DPAR: High-Performance, Secure, and Scalable Differential Privacy-based AllReduceHao Qi, Weicong Chen, Chenghong Wang, Xiaoyi Lu. 1917-1934 [doi]
- A Streaming Collectives Interface Targeting Dataflow Acceleration and HPC WorkloadsNicholas Contini, Jake Queiser, Bharath Ramesh 0005, Hari Subramoni, Dhabaleswar K. Panda 0001. 1935-1950 [doi]
- Diff-MoE: Efficient Batched MoE Inference with Priority-Driven Differential Expert CachingKexin Li, Wenkan Huang, Qinggang Wang, Long Zheng 0003, Xiaofei Liao, Hai Jin 0001, Jingling Xue. 1951-1965 [doi]
- What to Support When You're Compressing: The State of Practice Gaps and Opportunities for Scientific Data CompressionFranck Cappello, Robert Underwood, Yuri Alexeev, Alison Baker, Ebru Bozdag, Martin Burtscher, Kyle Chard, Sheng Di, Kyle Gerard Felker, Paul Christopher O'Grady, Hanqi Guo 0001, Yafan Huang, Peng Jiang 0004, Sian Jin, Petter Johansson, Shaomeng Li, Xin Liang 0001, Erik Lindahl, Peter Lindstrom 0001, Zarija Lukic, Magnus Lundborg, Danylo Lykov, Masaru Nagaso, Kento Sato, Amarjit Singh, Seung Woo Son, Shihui Song, William Tang 0002, Dingwen Tao, Jiannan Tian, Kazutomo Yoshii, Kai Zhao 0008. 1966-1979 [doi]
- Generative Latent Diffusion for Efficient Spatiotemporal Data ReductionXiao Li, Liangji Zhu, Anand Rangarajan 0001, Sanjay Ranka. 1980-1991 [doi]
- Stability-preserving Lossy Compression for Large-scale Partial Differential EquationsQian Gong, Mark Ainsworth, Jieyang Chen, Xin Liang 0001, Liangji Zhu, Ethan Klasky, Tushar M. Athawale, Qing Liu 0002, Anand Rangarajan 0001, Sanjay Ranka, Scott Klasky. 1992-2005 [doi]
- lsCOMP: Efficient Light Source CompressionYafan Huang, Sheng Di, Robert Underwood, Peco Myint, Miaoqi Chu, Guanpeng Li, Nicholas Schwarz, Franck Cappello. 2006-2023 [doi]
- Boosting Scientific Error-Bounded Lossy Compression through Optimized Synergistic Lossy-Lossless OrchestrationShixun Wu, Jinwen Pan, Jinyang Liu 0003, Jiannan Tian, Ziwei Qiu, Jiajun Huang 0001, Kai Zhao 0008, Xin Liang 0001, Sheng Di, Zizhong Chen, Franck Cappello. 2024-2037 [doi]
- STZ: A High Quality and High Speed Streaming Lossy Compression Framework for Scientific DataDaoce Wang, Pascal Grosset, Jesus Pulido, Jiannan Tian, Tushar M. Athawale, Jinda Jia, Baixi Sun, Boyuan Zhang 0002, Sian Jin, Kai Zhao 0008, James P. Ahrens, Fengguang Song. 2038-2055 [doi]
- GPU Lossy Compression for HPC Can Be Versatile and Ultra-FastYafan Huang, Sheng Di, Guanpeng Li, Franck Cappello. 2056-2075 [doi]
- HP-MDR: High-performance and Portable Data Refactoring and Progressive Retrieval with Advanced GPUsYanliang Li, Wenbo Li, Qian Gong, Qing Liu 0002, Norbert Podhorszki, Scott Klasky, Xin Liang 0001, Jieyang Chen. 2076-2093 [doi]
- AMRaCut: Scalable Partitioning for Adaptive Mesh RefinementBudvin Edippuliarachchi, David Van Komen, Hari Sundar. 2094-2108 [doi]
- Wasp: Efficient Asynchronous Single-Source Shortest Path on Multicore Systems via Work StealingMarco D'Antonio, Son Thai Mai, Philippas Tsigas, Hans Vandierendonck. 2109-2125 [doi]
- Matrix Is All You Need: Rearchitecting Quantum Chemistry to Scale on AI AcceleratorsHaozhi Han, Kun Li 0016, Fusong Ju, Qi Li, Hong An, Yifeng Chen, Yunquan Zhang, Ting Cao 0003, Mao Yang 0004. 2126-2142 [doi]
- BLAZE: Exploiting Hybrid Parallelism and Size-customized Kernels to Accelerate BLASTP on GPUsSree Charan Gundabolu, Mithuna Thottethodi, T. N. Vijaykumar. 2143-2157 [doi]
- Microscopic-Level Mouse Whole Cortex Simulation Composed of 9 Million Biophysical Neurons and 26 Billion Synapses on the Supercomputer FugakuRin Kuriyama, Kaaya Akira, Laura Green, Beatriz Herrera, Kael Dai, Mari Iura, Gilles Gouaillardet, Asako Terasawa, Taira Kobayashi, Jun Igarashi, Anton Arkhipov, Tadashi Yamazaki. 2158-2171 [doi]
- Trillion Ligands per Day: Performance-Portable Virtual Screening via Compound Database Optimization and Multi-Target DockingXiaohui Duan, Cheng Shen, Gaowei Chen, Shanshan Wu, Yizhen Wang, Yizhen Chen, Qixin Chang, Qiancheng Xia, Zekun Yin, Lin Gan, Yibing Shan, Guangwen Yang, Weiguo Liu, Niu Huang. 2172-2185 [doi]
- T2-RELION: Task Parallelism, Tensor Core Accelerated RELION for Cryo-EM 3D ReconstructionJiayu Fu, Jingle Xu, Lin Gan, Tianqi Mao, Zirong Shen, Yinuo Wang, Zeyu Song, Xiaohui Duan, Wei Xue 0003, Guangwen Yang. 2186-2202 [doi]
- Workload Intelligence: Workload-Aware IaaS abstraction for Cloud EfficiencyLexiang Huang, Anjaly Parayil, Jue Zhang, Xiaoting Qin, Chetan Bansal, Jovan Stojkovic, Pantea Zardoshti, Pulkit A. Misra, Eli Cortez, Raphael Ghelman, Íñigo Goiri, Saravan Rajmohan, Jim Kleewein, Rodrigo Fonseca, Timothy Zhu, Ricardo Bianchini. 2203-2215 [doi]
- cMPI: Using CXL Memory Sharing for MPI One-Sided and Two-Sided Inter-Node CommunicationsXi Wang, Bin Ma, Jongryool Kim, Byungil Koh, Hoshik Kim, Dong Li 0001. 2216-2232 [doi]
- DHAP: Towards Efficient OLAP in a Disaggregated and Heterogeneous EnvironmentGuangda Liu, Chenqi Zhang, Yizhou Shan, Hao Feng, Zeke Wang, Shixuan Sun, Minyi Guo, Jieru Zhao. 2233-2250 [doi]
- Make Updates Faster: A Fast Multi-Stripe Updates Framework in Erasure-Coded Storage ClustersHai Zhou, Dan Feng. 2251-2265 [doi]
- Reproducibility Report for SC25 Paper Uno: A One-Stop Solution for Inter- and Intra- Data Center Congestion Control and Reliable ConnectivityStrahinja Trecakov. 2266-2267 [doi]
- Reproducibility Report for SC25 Paper ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed StorageIacopo Colonnelli. 2268 [doi]
- Reproducibility Report for SC25 Paper TensorMD: Accelerating Molecular Dynamics with a High-Performance Machine Learning Interatomic PotentialMinh Chung. 2269-2270 [doi]
- Reproducibility Report for SC25 Paper TurboFNO: High-Performance Fourier Neural Operator with Fused FFT-GEMM-iFFT on GPUArjun Parab. 2271 [doi]
- Reproducibility Report for SC25 Paper X-MoE: Enabling Scalable Training for Emerging Mixture-of-Experts Architectures on HPC PlatformsJoseph Schuchart. 2272-2273 [doi]
- Reproducibility Report for SC25 Paper SIGMo: High-Throughput Batched Subgraph Isomorphism on GPUs for Molecular MatchingGianluca Mittone. 2274 [doi]
- Reproducibility Report for SC25 Paper MLP-Offload: Multi-Level, Multi-Path Offloading for LLM Pre-training to Break the GPU Memory WallBenjamin Brock. 2275-2276 [doi]
- Reproducibility Report for SC25 Paper STZ: A High Quality and High Speed Streaming Lossy Compression Framework for Scientific DataMarc-André Vef. 2277 [doi]
- Reproducibility Report for SC25 Paper Story of Two GPUs: Characterizing the Resilience of Hopper H100 and Ampere A100 GPUsPhilippe Swartvagher. 2278-2279 [doi]
- Reproducibility Report for SC25 Paper Optimizing Quantum Circuit Mapping to Reduce Inter-Module Communications in Distributed ArchitecturesKurt H. Maier. 2280-2281 [doi]
- Reproducibility Report for SC25 Paper MXBLAS: Accelerating 8-bit Deep Learning with a Unified Micro-Scaled GEMM LibraryRoberto R. Expósito. 2282 [doi]
- Reproducibility Report for SC25 Paper C.A.T.S.: Memory and Control Flow Tracing for Whole-Program Performance AnalysisKurt H. Maier. 2283-2284 [doi]
- Reproducibility Report for SC25 Paper Numerical Performance of the Implicitly Restarted Arnoldi Method in OFP8, Bfloat16, Posit, and Takum ArithmeticsPedro Bruel. 2285 [doi]
- Reproducibility Report for SC25 Paper Moment: Co-optimizing Physical Communication Topology and Data Placement for Multi-GPU Out-of-core GNN TrainingMinh Chung. 2286 [doi]
- Reproducibility Report for SC25 Paper Sparsified Preconditioned Conjugate Gradient Solver on GPUsSixu Li. 2287-2289 [doi]
- Reproducibility Report for SC25 Paper MANS: Efficient and Portable ANS Encoding for Multi-Byte Integer Data on CPUs and GPUsJoshua Hoke Davis. 2290-2291 [doi]
- Reproducibility Report for SC25 Paper Caracal: A GPU-Resident Sparse LU Solver with Lightweight Fine-Grained SchedulingDogan Sagbili. 2292-2293 [doi]
- Reproducibility Report for SC25 Paper MetoHash: A Memory-Efficient and Traffic-Optimized Hashing Index on Hybrid PMem-DRAM MemoriesAlessio Orsino. 2294 [doi]
- Reproducibility Report for SC25 Paper CPU- and GPU-initiated Communication Strategies for Conjugate Gradient Methods on Large GPU ClustersBrian J. N. Wylie. 2295-2296 [doi]
- Reproducibility Report for SC25 Paper lsCOMP: Efficient Light Source CompressionArjun Parab. 2297 [doi]
- Reproducibility Report for SC25 Paper lsCOMP: Efficient Light Source CompressionVinícius Garcia Pinto. 2298-2299 [doi]
- Reproducibility Report for SC25 Paper Zero-Value Code Specialization via Profile-Guided Control Data Flow AnalysisQuentin Guilloteau. 2300-2303 [doi]
- Reproducibility Report for SC25 Paper High-Performance Branch-Free Algorithms for Extended-Precision Floating-Point ArithmeticMinh Chung. 2304-2305 [doi]
- Reproducibility Report for SC25 Paper GPU Lossy Compression for HPC Can Be Versatile and Ultra-FastThomas Randall. 2306 [doi]
- Reproducibility Report for SC25 Paper Ab-initio Quantum Transport with the GW Approximation, 42, 240 Atoms, and Sustained Exascale PerformanceSayef Azad Sakin. 2307-2308 [doi]
- Reproducibility Report for SC25 Paper KAMI: Communication-Avoiding General Matrix Multiplication within a Single GPUAmir Raoofy. 2309-2310 [doi]
- Reproducibility Report for SC25 Paper Demystifying the Resilience of Large Language Model Inference: An End-to-End PerspectiveSandra Wienke. 2311-2313 [doi]
- Reproducibility Report for SC25 Paper HP-MDR: High-performance and Portable Data Refactoring and Progressive Retrieval with Advanced GPUsPhilippe Swartvagher. 2314-2315 [doi]
- Reproducibility Report for SC25 Paper Bridging the Gap Between Binary and Source Based Package Management in SpackIacopo Colonnelli. 2316-2317 [doi]
- Reproducibility Report for SC25 Paper FaSTCC: Fast Sparse Tensor Contractions on CPUsMarcel Koch. 2318 [doi]
- Reproducibility Report for SC25 Paper RedSan: A Redundant Memory Instruction Sanitizer for GPU ProgramsVolker Weinberg. 2319-2320 [doi]
- Reproducibility Report for SC25 Paper RAPTOR: Practical Numerical Profiling of Scientific ApplicationsRuben Laso. 2321 [doi]
- Reproducibility Report for SC25 Paper Addressing Reproducibility Challenges in HPC with Continuous IntegrationRuben Laso. 2322 [doi]
- Reproducibility Report for SC25 Paper ThirstyFLOPS: Water Footprint Modeling and Analysis Toward Sustainable HPC SystemsShaina Smith. 2323-2324 [doi]
- Reproducibility Report for SC25 Paper DRIM-ANN: An Approximate Nearest Neighbor Search Engine based on Commercial DRAM-PIMsSergej Breiter. 2325-2326 [doi]
- Reproducibility Report for SC25 Paper Bine Trees: Enhancing Collective Operations by Optimizing Communication LocalityJan Laukemann. 2327-2328 [doi]
- Reproducibility Report for SC25 Paper XaaS Containers: Performance-Portable Representation With Source and IR ContainersJoao Vicente Ferreira Lima. 2329 [doi]