Abstract is missing.
- ORBIT: Oak Ridge Base Foundation Model for Earth System PredictabilityXiao Wang 0004, Siyan Liu, Aristeidis Tsaris, Jong Youl Choi, Ashwin M. Aji, Ming Fan, Wei Zhang, Junqi Yin, Moetasim Ashfaq, Dan Lu 0001, Prasanna Balaprakash. 1 [doi]
- Boosting Earth System Model Outputs And Saving PetaBytes in Their Storage Using Exascale Climate EmulatorsSameh Abdulah, Allison H. Baker, George Bosilca, Qinglei Cao, Stefano Castruccio, Marc G. Genton, David E. Keyes, Zubair Khalid, Hatem Ltaief, Yan Song, Georgiy L. Stenchikov, Ying Sun 0002. 2 [doi]
- A Performance-Portable Kilometer-Scale Global Ocean Model on ORISE and New Sunway Heterogeneous SupercomputersJunlin Wei, Xiang Han, Jiangfeng Yu, Jinrong Jiang, Hailong Liu, Pengfei Lin, Maoxue Yu, Kai Xu, Lian Zhao, Pengfei Wang, Weipeng Zheng, Jingwei Xie, Yanzhi Zhou, Tao Zhang 0096, Feng Zhang, Yehong Zhang, Yue Yu 0001, Yuzhu Wang, Yidi Bai, Chen Li, Zipeng Yu, Haoyu Deng, Yaxin Li, Xuebin Chi. 3 [doi]
- Democratizing AI: Open-source Scalable LLM Training on GPU-based SupercomputersSiddharth Singh, Prajwal Singhania, Aditya Ranjan, John Kirchenbauer, Jonas Geiping, Yuxin Wen, Neel Jain, Abhimanyu Hans, Manli Shu, Aditya Tomar, Tom Goldstein, Abhinav Bhatele. 4 [doi]
- Pushing the Limit of Quantum Mechanical Simulation to the Raman Spectra of a Biological System with 100 Million AtomsHonghui Shang, Ying Liu, Zhikun Wu, Zhenchuan Chen, Jinfeng Liu 0004, Meiyue Shao, Yingzhou Li, Bowen Kan, Huimin Cui, Xiaobing Feng 0002, Yunquan Zhang, Donald G. Truhlar, Hong An, Xiao He 0004, Jinlong Yang. 5 [doi]
- Toward Capturing Genetic Epistasis From Multivariate Genome-Wide Association Studies Using Mixed-Precision Kernel Ridge RegressionHatem Ltaief, Rabab Alomairy, Qinglei Cao, Jie Ren, Lotfi Slim, Thorsten Kurth, Benedikt Dorschner, Salim Bougouffa, Rached Abdelkhalak, David E. Keyes. 6 [doi]
- MProt-DPO: Breaking the ExaFLOPS Barrier for Multimodal Protein Design Workflows with Direct Preference OptimizationGautham Dharuman, Kyle Hippe, Alexander Brace, Sam Foreman, Väinö Hatanpää, Varuni Katti Sastry, Huihuo Zheng, Logan T. Ward, Servesh Muralidharan, Archit Vasan, Bharat Kale, Carla M. Mann, Heng Ma, Yun-Hsuan Cheng, Yuliana Zamora, Shengchao Liu, Chaowei Xiao, Murali Emani, Tom Gibbs, Mahidhar Tatineni, Deepak Canchi, Jerome Mitchell, Koichi Yamada, Maria Garzaran, Michael E. Papka, Ian T. Foster, Rick Stevens, Anima Anandkumar, Venkatram Vishwanath, Arvind Ramanathan. 7 [doi]
- Breaking the Molecular Dynamics Timescale Barrier Using a Wafer-Scale SystemKylee Santos, Stan G. Moore, Tomas Oppelstrup, Amirali Sharifian, Ilya Sharapov, Aidan P. Thompson, Delyan Z. Kalchev, Danny Perez, Robert Schreiber, Scott Pakin, Edgar A. Leon, James H. Laros III, Michael James 0002, Sivasankaran Rajamanickam. 8 [doi]
- Breaking the Million-Electron and 1 EFLOP/s Barriers: Biomolecular-Scale Ab Initio Molecular Dynamics Using MP2 PotentialsRyan Stocks, Jorge L. Galvez Vallejo, Fiona C. Y. Yu, Calum Snowdon, Elise Palethorpe, Jakub Kurzak, Dmytro Bykov, Giuseppe M. J. Barca. 9 [doi]
- M3XU: Achieving High-Precision and Complex Matrix Multiplication with Low-Precision MXUsDongho Ha, Yunan Zhang, Chen-Chien Kao, Christopher J. Hughes, Won Woo Ro, Hung-Wei Tseng 0001. 10 [doi]
- Hydrogen: Contention-Aware Hybrid Memory for Heterogeneous CPU-GPU ArchitecturesYiwei Li 0004, Mingyu Gao 0001. 11 [doi]
- EcoLife: Carbon-Aware Serverless Function Scheduling for Sustainable ComputingYankai Jiang 0002, Rohan Basu Roy, Baolin Li, Devesh Tiwari. 12 [doi]
- cuSZ-i: High-Ratio Scientific Lossy Compression on GPUs with Optimized Multi-Level InterpolationJinyang Liu 0003, Jiannan Tian, Shixun Wu, Sheng Di, Boyuan Zhang 0002, Robert Underwood, Yafan Huang, Jiajun Huang, Kai Zhao 0008, Guanpeng Li, Dingwen Tao, Zizhong Chen, Franck Cappello. 13 [doi]
- Tango: A Cross-layer Approach to Managing I/O Interference over Local Ephemeral StorageZhenbo Qiao, Qirui Tian, Zhenlu Qin, Jinzhen Wang, Qing Liu 0002, Norbert Podhorszki, Scott Klasky, Hongjian Zhu. 14 [doi]
- cuSZp2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression RatioYafan Huang, Sheng Di, Guanpeng Li, Franck Cappello. 15 [doi]
- LLM-Pilot: Characterize and Optimize Performance of your LLM Inference ServicesMalgorzata Lazuka, Andreea Anghel, Thomas P. Parnell. 16 [doi]
- DFTracer: An Analysis-Friendly Data Flow Tracer for AI-Driven WorkflowsHariharan Devarajan, Loïc Pottier, Kaushik Velusamy, Huihuo Zheng, Izzet Yildirim, Olga Kogiou, Weikuan Yu, Anthony Kougkas, Xian-He Sun, Jae-Seung Yeom, Kathryn M. Mohror. 17 [doi]
- Efficient Weighted Graph Matching on GPUsMichael Mandulak, Sayan Ghosh, S. M. Ferdous, Mahantesh Halappanavar, George M. Slota. 18 [doi]
- Automated Code Generation of High-Order Stencils for a Dataflow ArchitectureRyuichi Sai, John M. Mellor-Crummey, Jinfan Xu, Mauricio Araya-Polo. 19 [doi]
- Moirae: Generating High-Performance Composite Stencil Programs with Global OptimizationsXiaoyan Liu, Xinyu Yang, Kejie Ma, Shanghao Liu, Kaige Zhang, Hailong Yang, Yi Liu 0013, Zhongzhi Luan, Depei Qian. 20 [doi]
- autoGEMM: Pushing the Limits of Irregular Matrix Multiplication on Arm ArchitecturesDu Wu, Jintao Meng, Wenxi Zhu, Minwen Deng, Xiao Wang 0004, Tao Luo 0014, Mohamed Wahib, Yanjie Wei. 21 [doi]
- Accurate and Convenient Energy Measurements for GPUs: A Detailed Study of NVIDIA GPU's Built-In Power SensorZeyu Yang, Karel Adámek, Wesley Armour. 22 [doi]
- A Digital Twin Framework for Liquid-cooled Supercomputers as Demonstrated at ExascaleWesley Brewer, Matthias Maiterth, Vineet Kumar, Rafal P. Wojda, Sedrick Bouknight, Jesse Hines, Woong Shin, Scott Greenwood, David Grant, Wesley Williams, Feiyi Wang. 23 [doi]
- Toward Sustainable HPC: In-Production Deployment of Incentive-Based Power Efficiency Mechanism on the Fugaku SupercomputerAna Luisa Veroneze Solórzano, Kento Sato, Keiji Yamamoto, Fumiyoshi Shoji, Jim M. Brandt, Benjamin Schwaller, Sara Petra Walton, Jennifer Green, Devesh Tiwari. 24 [doi]
- Towards Highly Compatible I/O-Aware Workflow Scheduling on HPC SystemsYiqin Dai, Ruibo Wang, Yong Dong, Kai Lu. 25 [doi]
- PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU ClustersRutwik Jain, Brandon Tran, Keting Chen, Matthew D. Sinclair, Shivaram Venkataraman. 26 [doi]
- Toward High-Performance Blockchain System by Blurring the Line between Ordering and ExecutionDonghyeon Ryu, Chanik Park. 27 [doi]
- Matrix-Free Finite Volume Kernels on a Dataflow ArchitectureRyuichi Sai, François P. Hamon, John M. Mellor-Crummey, Mauricio Araya-Polo. 28 [doi]
- Rapid GPU-Based Pangenome Graph LayoutJiajie Li, Jan-Niklas Schmelzle, Yixiao Du, Simon Heumos, Andrea Guarracino, Giulia Guidi, Pjotr Prins, Erik Garrison, Zhiru Zhang. 29 [doi]
- Scaling Molecular Dynamics with ab initio Accuracy to 149 Nanoseconds per DayJianxiong Li, Boyang Li, Zhuoqiang Guo, Mingzhen Li, Enji Li, Lijun Liu, Guojun Yuan, Zhan Wang, Guangming Tan, Weile Jia. 30 [doi]
- An Evaluation of the Effect of Network Cost Optimization for Leadership Class SupercomputersAwais Khan 0002, John R. Lange, Nick Hagerty, Edwin F. Posada, John K. Holmen, James B. White, James Austin Harris, Verónica Melesse Vergara, Christopher Zimmer 0001, Scott Atchley. 31 [doi]
- Application-Driven Exascale: The JUPITER Benchmark SuiteAndreas Herten, Sebastian Achilles, Damian Alvarez, Jayesh Badwaik, Eric Behle, Mathis Bode, Thomas Breuer, Daniel Caviedes-Voullième, Mehdi Cherti, Adel Dabah, Salem El Sayed, Wolfgang Frings, Ana Gonzalez-Nicolas, Eric B. Gregory, Kaveh Haghighi Mood, Thorsten Hater, Jenia Jitsev, Chelsea Maria John, Jan H. Meinke, Catrin I. Meyer, Pavel Mezentsev, Jan-Oliver Mirus, Stepan Nassyr, Carolin Penke, Manoel Römmer, Ujjwal Sinha, Benedikt von St. Vieth, Olaf Stein, Estela Suarez, Dennis Willsch, Ilya Zhukov. 32 [doi]
- Exploring GPU-to-GPU Communication: Insights into Supercomputer InterconnectsDaniele De Sensi, Lorenzo Pichetti, Flavio Vella, Tiziano De Matteis, Zebin Ren, Luigi Fusco, Matteo Turisini, Daniele Cesarini, Kurt Lust, Animesh Trivedi, Duncan Roweth, Filippo Spiga, Salvatore Di Girolamo, Torsten Hoefler. 33 [doi]
- MCFuser: High-Performance and Rapid Fusion of Memory-Bound Compute-Intensive OperatorsZheng Zhang 0036, Donglin Yang, Xiaobo Zhou 0002, Dazhao Cheng. 34 [doi]
- Static Generation of Efficient OpenMP Offload Data MappingsLuke Marzen, Akash Dutta, Ali Jannesari. 35 [doi]
- HiRace: Accurate and Fast Data Race Checking for GPU ProgramsJohn Jacobson, Martin Burtscher, Ganesh Gopalakrishnan. 36 [doi]
- Boosting Data Center Performance via Intelligently Managed Multi-backend Disaggregated MemoryJing Wang 0055, Hanzhang Yang, Chao Li 0009, Yiming Zhuansun, Wang Yuan, Cheng Xu, Xiaofeng Hou, Minyi Guo, Yang Hu 0001, Yaqian Zhao. 37 [doi]
- SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless ComputingChengzhi Lu, Huanle Xu, Yudan Li, Wenyan Chen 0001, Kejiang Ye, Chengzhong Xu 0001. 38 [doi]
- Stellaris: Staleness-Aware Distributed Reinforcement Learning with Serverless ComputingHanfei Yu, Hao Wang 0022, Devesh Tiwari, Jian Li 0008, Seung-Jong Park. 39 [doi]
- PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined SpeculationBranden Butler, Sixing Yu, Arya Mazaheri, Ali Jannesari. 40 [doi]
- RecFlex: Enabling Feature Heterogeneity-Aware Optimization for Deep Recommendation Models with Flexible SchedulesZaifeng Pan, Zhen Zheng, Feng Zhang 0007, Bing Xie, Ruofan Wu, Shaden Smith, Chuanjie Liu, Olatunji Ruwase, Xiaoyong Du 0001, Yufei Ding. 41 [doi]
- ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud EnvironmentsMunkyu Lee, Sihoon Seong, Minki Kang, Jihyuk Lee, Gap-Joo Na, In-Geol Chun, Dimitrios Nikolopoulos, Cheol-Ho Hong. 42 [doi]
- CUDASTF: Bridging the Gap Between CUDA and Task ParallelismCédric Augonnet, Andrei Alexandrescu, Albert Sidelnik, Michael Garland. 43 [doi]
- KaMPIng: Flexible and (Near) Zero-Overhead C++ Bindings for MPITim Niklas Uhl, Matthias Schimek, Lukas Hübner, Demian Hespe, Florian Kurpicz, Daniel Seemaier, Christoph Stelz, Peter Sanders 0001. 44 [doi]
- NetCL: A Unified Programming Framework for In-Network ComputingGeorge Karlos, Henri E. Bal, Lin Wang 0015. 45 [doi]
- Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse Tall-and-Skinny Matrix MultiplicationIsuru Ranawaka, Md Taufique Hussain, Charles Block, Gerasimos Gerogiannis, Josep Torrellas, Ariful Azad. 46 [doi]
- A Sparsity-Aware Distributed-Memory Algorithm for Sparse-Sparse Matrix MultiplicationYuxi Hong, Aydin Buluç. 47 [doi]
- A Conflict-aware Divide-and-Conquer Algorithm for Symmetric Sparse Matrix-Vector MultiplicationHaozhong Qiu, Chuanfu Xu, Jianbin Fang, Jian Zhang, Liang Deng, Yue Ding 0001, Qingsong Wang, Shizhao Chen, Yonggang Che, Jie Liu 0002. 48 [doi]
- Accelerating Distributed DLRM Training with Optimized TT Decomposition and Micro-BatchingWeihu Wang, Yaqi Xia, Donglin Yang, Xiaobo Zhou 0002, Dazhao Cheng. 49 [doi]
- Scaling New Heights: Transformative Cross-GPU Sampling for Training Billion-Edge GraphsYaqi Xia, Donglin Yang, Xiaobo Zhou 0002, Dazhao Cheng. 50 [doi]
- A Scalable Algorithm for Active LearningYouguang Chen, Zheyu Wen, George Biros. 51 [doi]
- AmgT: Algebraic Multigrid Solver on Tensor CoresYuechen Lu, Lijie Zeng, Tengcheng Wang, Xu Fu, Wenxuan Li, Helin Cheng, Dechuang Yang, Zhou Jin 0001, Marc Casas, Weifeng Liu 0002. 52 [doi]
- LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor CoresYiWei Zhang, Kun Li, Liang Yuan, Jiawen Cheng, Yunquan Zhang, Ting Cao, Mao Yang. 53 [doi]
- High Performance Unstructured SpMM Computation Using Tensor CoresPatrik Okanovic, Grzegorz Kwasniewski, Paolo Sylos Labini, Maciej Besta, Flavio Vella, Torsten Hoefler. 54 [doi]
- Versatile Datapath Soft Error Detection on the Cheap for HPC ApplicationsYafan Huang, Sheng Di, Zhaorui Zhang, Xiaoyi Lu, Guanpeng Li. 55 [doi]
- MCBound: An Online Framework to Characterize and Classify Memory/Compute-bound HPC JobsFrancesco Antici, Andrea Bartolini, Zeynep Kiziltan, Özalp Babaoglu, Yuetsu Kodama. 56 [doi]
- GVARP: Detecting Performance Variance on Large-Scale Heterogeneous SystemsXin You, Zhibo Xuan, Hailong Yang, Zhongzhi Luan, Yi Liu, Depei Qian. 57 [doi]
- Mille-feuille: A Tile-Grained Mixed Precision Single-Kernel Conjugate Gradient Solver on GPUsDechuang Yang, Yuxuan Zhao, Yiduo Niu, Weile Jia, En Shao, Weifeng Liu 0002, Guangming Tan, Zhou Jin 0001. 58 [doi]
- DBSR: An Efficient Storage Format for Vectorizing Sparse Triangular Solvers on Structured GridsXiaojian Yang, Shengguo Li, Fan Yuan, Dezun Dong. 59 [doi]
- Many-Body Electronic Correlation Energy using Krylov Subspace Linear SolversShikhar Shah, Boqin Zhang, Hua Huang, John E. Pask, Phanish Suryanarayana, Edmond Chow. 60 [doi]
- Enabling 13K-Atom Excited-State GW Calculations via Low-Rank Approximations and HPC on the New Sunway SupercomputerWentiao Wu, Zhengbang Zhou, Qingcai Jiang, Junwei Feng, Xinming Qin, Huanhuan Ma, Zhenwei Cao, Junshi Chen, Sheng Chen, Xinyong Meng, Bingkun Hou, Yuanfan Xiong, Linhao Wang, Yixuan Sun, Hong An, Jinlong Yang, Wei Hu 0006. 61 [doi]
- Reshaping High Energy Physics Applications for Near-Interactive Execution Using TaskVineBarry Sly-Delgado, Ben Tovar, Jin Zhou, Douglas Thain. 62 [doi]
- Towards Exascale Simulations of Nanoelectronic Devices in the GW ApproximationLeonard Deuschle, Alexander Maeder, Vincent Maillou, Nicolas Vetsch, Anders Winka, Jiang Cao, Alexandros Nikolaos Ziogas, Mathieu Luisier. 63 [doi]
- LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear ProgrammingSiyuan Shen, Langwen Huang, Marcin Chrapek, Timo Schneider, Jai Dayal, Manisha Gajbe, Robert Wisniewski, Torsten Hoefler. 64 [doi]
- A Workflow Roofline Model for End-to-End Workflow Performance AnalysisNan Ding 0006, Brian Austin, Yang Liu 0179, Neil Mehta, Steven Farrell, Johannes P. Blaschke, Leonid Oliker, Hai Ah Nam, Nicholas J. Wright, Samuel Williams 0001. 65 [doi]
- Learning Generalizable Program and Architecture Representations for Performance ModelingLingda Li, Thomas Flynn 0001, Adolfy Hoisie. 66 [doi]
- LexiQL: Quantum Natural Language Processing on NISQ-era MachinesDaniel Silver, Aditya Ranjan, Rakesh Achutha, Tirthak Patel, Devesh Tiwari. 67 [doi]
- Optimizing Quantum Fourier Transformation (QFT) Kernels for Modern NISQ and FT ArchitecturesYuwei Jin, Xiangyu Gao, Minghao Guo, Henry Chen, Fei Hua, Chi Zhang 0041, Eddy Z. Zhang. 68 [doi]
- On the Efficacy of Surface Codes in Compensating for Radiation Events in Superconducting DevicesMarzio Vallero, Gioele Casagranda, Flavio Vella, Paolo Rech. 69 [doi]
- Revisiting Computation for Research: Practices and TrendsJeremiah Giordani, Ziyang Xu, Ella Colby, August Ning, Bhargav Reddy Godala, Ishita Chaturvedi, Shaowei Zhu, Yebin Chon, Greg Chan, Zujun Tan, Galen Collier, Jonathan D. Halverson, Enrico Armenio Deiana, Jasper Liang, Federico Sossai, Yian Su, Atmn Patel, Bangyen Pham, Nathan Greiner, Simone Campanoni, David I. August. 70 [doi]
- Understanding Data Movement Patterns in HPC: A NERSC Case StudyAnna Giannakou, Damian Hazen, Bjoern Enders, Lavanya Ramakrishnan, Nicholas J. Wright. 71 [doi]
- HPAC-ML: A Programming Model for Embedding ML Surrogates in Scientific ApplicationsZane Fink, Konstantinos Parasyris, Praneet Rathi, Giorgis Georgakoudis, Harshitha Menon, Peer-Timo Bremer. 72 [doi]
- Parallax: A Compiler for Neutral Atom Quantum Computers under Hardware ConstraintsJason Ludmir, Tirthak Patel. 73 [doi]
- MixQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online PredictionYidong Chen 0003, Chen Zhang 0001, Rongchao Dong, Haoyuan Zhang, Yonghua Zhang, Zhonghua Lu, Jidong Zhai. 74 [doi]
- Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy SparsityTuowei Wang, Kun Li, Zixu Hao, Donglin Bai, Ju Ren 0001, Yaoxue Zhang, Ting Cao, Mao Yang 0004. 75 [doi]
- Adaptive Patching for High-resolution Image Segmentation with TransformersEnzhi Zhang, Isaac Lyngaas, Peng Chen 0035, Xiao Wang 0004, Jun Igarashi, Yuankai Huo, Masaharu Munetomo, Mohamed Wahib. 76 [doi]
- TorchGT: A Holistic System for Large-Scale Graph Transformer TrainingMeng Zhang, Jie Sun 0017, Qinghao Hu, Peng Sun 0006, Zeke Wang, Yonggang Wen 0001, Tianwei Zhang 0004. 77 [doi]
- Exploring Efficient Partial Differential Equation Solution Using Speed Galerkin TransformerXun Wang, Zeyang Zhu, Xiangyu Meng 0005, Tao Song 0001. 78 [doi]
- Surpassing Sycamore: Achieving Energetic Superiority Through System-Level Circuit SimulationRong Fu, Zhongling Su, Han-Sen Zhong, Xiti Zhao, Jianyang Zhang, Feng Pan, Pan Zhang, Xianhe Zhao, Ming-Cheng Chen, Chao-Yang Lu, Jian-Wei Pan, Zhilin Pei, Xingcheng Zhang, Wanli Ouyang. 79 [doi]
- Realizing Quantum Kernel Models at Scale with Matrix Product State SimulationMekena Metcalf, Pablo Andrés-Martínez, Nathan Fitzpatrick. 80 [doi]
- Atlas: Hierarchical Partitioning for Quantum Circuit Simulation on GPUsMingkuan Xu, Shiyi Cao, Xupeng Miao, Umut A. Acar, Zhihao Jia. 81 [doi]
- Unlocking High Performance with Low-Bit NPUs and CPUs for Highly Optimized HPL-MxP on Cloud Brain IIWeicheng Xue, Kai Yang, Yongxiang Liu, Dengdong Fan, Pengxiang Xu, YongHong Tian. 82 [doi]
- Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep LearningWei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, Jianzhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei, Junjie Qiu, Hui Qu, Zehui Ren, Zhangli Sha, Xuecheng Su, Xiaowen Sun, Yixuan Tan, Minghui Tang, Shiyu Wang, Yaohui Wang, Yongji Wang, Ziwei Xie, Yiliang Xiong, Yanhong Xu, Shengfeng Ye, Shuiping Yu, Yukun Zha, Liyue Zhang, Haowei Zhang, Mingchuan Zhang, Wentao Zhang, Yichao Zhang, Chenggang Zhao, Yao Zhao, Shangyan Zhou, Shunfeng Zhou, Yuheng Zou. 83 [doi]
- A Probabilistic Approach To Selecting Build Configurations in Package ManagersDaniel Nichols, Harshitha Menon, Todd Gamblin, Abhinav Bhatele. 84 [doi]
- A High-Quality Workflow for Multi-Resolution Scientific Data Reduction and VisualizationDaoce Wang, Pascal Grosset, Jesus Pulido, Tushar M. Athawale, Jiannan Tian, Kai Zhao 0008, Zarija Lukic, Axel Huebl, Zhe Wang, James P. Ahrens, Dingwen Tao. 85 [doi]
- Error-controlled Progressive Retrieval of Scientific Data under Derivable Quantities of InterestXuan Wu, Qian Gong, Jieyang Chen, Qing Liu 0002, Norbert Podhorszki, Xin Liang 0001, Scott Klasky. 86 [doi]
- CARP: Range Query-Optimized Indexing for Streaming DataAnkush Jain, Charles D. Cranor, Qing Zheng, Bradley W. Settlemyer, George Amvrosiadis, Gary A. Grider. 87 [doi]
- Optimizing Distributed ML Communication with Fused Computation-Collective OperationsKishore Punniyamurthy, Khaled Hamidouche, Bradford M. Beckmann. 88 [doi]
- Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy CompressionHao Feng, Boyuan Zhang 0002, Fanjiang Ye, Min-Si, Ching-Hsiang Chu, Jiannan Tian, Chunxing Yin, Summer Deng, Yuchen Hao, Pavan Balaji, Tong Geng, Dingwen Tao. 89 [doi]
- APTMoE: Affinity-Aware Pipeline Tuning for MoE Models on Bandwidth-Constrained GPU NodesYuanxin Wei, Jiangsu Du, Jiazhi Jiang, Xiao Shi, XianWei Zhang, Dan Huang, Nong Xiao, Yutong Lu. 90 [doi]
- Accelerated Atomistic Kinetic Monte Carlo Simulations of Resistive Memory ArraysManasa Kaniselvan, Alexander Maeder, Marko Mladenovic, Mathieu Luisier, Alexandros Nikolaos Ziogas. 91 [doi]
- Large Language Models for Anomaly Detection in Computational Workflows: From Supervised Fine-Tuning to In-Context LearningHongwei Jin, George Papadimitriou 0002, Krishnan Raghavan, Pawel Zuk, Prasanna Balaprakash, Cong Wang 0014, Anirban Mandal, Ewa Deelman. 92 [doi]
- Designing a GPU-Accelerated Communication Layer for Efficient Fluid-Structure Interaction Computations on Heterogeneous SystemsAristotle X. Martin, Geng Liu, Bálint Joó, Runxin Wu, Mohammed Shihab Kabir, Erik W. Draeger, Amanda Randles. 93 [doi]
- Efficient Tensor Offloading for Large Deep-Learning Model Training based on Compute Express LinkDong Xu, Yuan Feng, Kwangsik Shin, Daewoo Kim, Hyeran Jeon, Dong Li. 94 [doi]
- COAXIAL: A CXL-Centric Memory System for Scalable ServersAlbert Cho, Anish Saxena, Moinuddin Qureshi, Alexandros Daglis. 95 [doi]
- Switch-Less Dragonfly on Wafers: A Scalable Interconnection Architecture based on Wafer-Scale IntegrationYinxiao Feng, Kaisheng Ma. 96 [doi]
- Fast and Efficient Scaling for Microservices with SurgeGuardAnyesha Ghosh, Neeraja J. Yadwadkar, Mattan Erez. 97 [doi]
- Realizing Joint Extreme-Scale Simulations on Multiple Supercomputers - Two Superfacility Case StudiesTheresa Pollinger, Alexander Van Craen, Philipp Offenhäuser, Dirk Pflüger. 98 [doi]
- AutoCheck: Automatically Identifying Variables for Checkpointing by Data Dependency AnalysisXiang Fu, Weiping Zhang, Shiman Meng, Xin Huang, Wubiao Xu, Luanzheng Guo, Kento Sato. 99 [doi]
- Enumeration of Billions of Maximal Bicliques in Bipartite Graphs without Using GPUsZhe Pan, Shuibing He, Xu Li, Xuechen Zhang 0001, Yanlong Yin, Rui Wang 0076, Lidan Shou, Mingli Song, Xian-He Sun, Gang Chen 0001. 100 [doi]
- Doubling Graph Traversal Efficiency to 198 TeraTEPS on the Supercomputer FugakuJunya Arai, Masahiro Nakao, Yuto Inoue, Kanto Teranishi, Koji Ueno, Keiichiro Yamamura, Mitsuhisa Sato, Katsuki Fujisawa. 101 [doi]
- Asynchronous Distributed-Memory Parallel Algorithms for Influence MaximizationShubhendra Pal Singhal, Souvadra Hati, Jeffrey Young 0001, Vivek Sarkar, Akihiro Hayashi, Richard W. Vuduc. 102 [doi]
- Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AIMikhail Khalilov, Salvatore Di Girolamo, Marcin Chrapek, Rami Nudelman, Gil Bloch, Torsten Hoefler. 103 [doi]
- hZCCL: Accelerating Collective Communication with Co-Designed Homomorphic CompressionJiajun Huang, Sheng Di, Xiaodong Yu 0001, Yujia Zhai, Jinyang Liu 0003, Zizhe Jian, Xin Liang 0001, Kai Zhao 0008, Xiaoyi Lu, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur. 104 [doi]
- UNR: Unified Notifiable RMA Library for HPCGuangnan Feng, Jiabin Xie, Dezun Dong, Yutong Lu. 105 [doi]
- EXO: Accelerating Storage Paravirtualization with eBPFShi Qiu, Li Wang, Yiming Zhang. 106 [doi]
- CoRD: Combining Raid and Delta for Fast Partial Updates in Erasure-Coded Storage ClustersHai Zhou 0002, Dan Feng 0001, Yuchong Hu, Wei Wang 0021, Huadong Huang. 107 [doi]
- MegaMmap: Blurring the Boundary Between Memory and Storage for Data-Intensive WorkloadsLuke Logan, Anthony Kougkas, Xian-He Sun. 108 [doi]