Abstract is missing.
- Intelligent Sampling of Extreme-Scale Turbulence Datasets for Accurate and Efficient Spatiotemporal Model TrainingWesley Brewer, Murali Meena Gopalakrishnan, Matthias Maiterth, Aditya Kashi, Jong Youl Choi, Pei Zhang, Stephen Nichols, Riccardo Balin, Miles Couchman, Stephen de Bruyn Kops, P.-K. Yeung, Daniel Dotson, Rohini Uma-Vaideswaran, Sarp Oral, Feiyi Wang. 1-10 [doi]
- Guiding Application Users via Estimation of Computational Resources for Massively Parallel Chemistry ComputationsTanzila Tabassum, Omer Subasi, Ajay Panyala, Epiya Ebiapia, Gerald Baumgartner, Erdal Mutlu, P. Sadayappan, Karol Kowalski. 11-19 [doi]
- InferA: A Smart Assistant for Cosmological Ensemble DataJustin Z. Tam, Pascal Grosset, Divya Banesh, Nesar Ramachandra, Terece L. Turton, James P. Ahrens. 20-28 [doi]
- Inverse Design for Generating Initial Conditions in Scientific SimulationsLeslie A. Horace, Christin Whitton, Vanessa Job, William M. Jones, Nathan DeBardeleben. 29-36 [doi]
- Applying Surrogate Modeling to Decouple Data Collection and Analysis from Simulation for Accelerated In-Situ AnalysisKewei Yan, Yonghong Yan 0001. 37-44 [doi]
- Compute4Biology: Taking Stock of High Performance Computing Needs for Foundation Models in Biological SciencesPratik Dutta, Tirthankar Ghosal. 45-51 [doi]
- FIRST: Federated Inference Resource Scheduling Toolkit for Scientific AI Model AccessAditya Tanikanti, Benoît Côté, Yanfei Guo, Le Chen, Nicholaus Saint, Ryan Chard, Ken Raffenetti, Rajeev Thakur, Thomas D. Uram, Ian T. Foster, Michael E. Papka, Venkatram Vishwanath. 52-60 [doi]
- ROSE: RADICAL Orchestrator for Surrogate ExplorationAymen Alsaadi, Tianle Wang 0001, Andrew Park, Pradeep Bajracharya, Linwei Wang, Fanbo Sun, Sudip Seal, Vikram Jadhao, Geoffrey C. Fox, Shantenu Jha. 61-70 [doi]
- InferCT: An Efficient and Generalizable Framework to Enable 3D Machine Learning for Computed TomographyAustin Yunker, Weijian Zheng, Rajkumar Kettimuthu. 71-77 [doi]
- LangChain-Parsl: Connect Large Language Model Agents to High Performance Computing ResourceHeng Ma, Alexander Brace, Carlo Siebenschuh, Ian T. Foster, Arvind Ramanathan. 78-85 [doi]
- A Query Engine for Scientific Data Exploration using Theory, Simulation, and Artificial Intelligence ModelsAbhishek Dwaraki, Sreenivas Rangan Sukumar, Christopher D. Rickett, Clarete Riana Crasta, Harumi Kuno, Michael Neal, Pankaj Pandey, Ryan Yates, Karlon West, Amar Gopal Chittiboyina, Ikhlas A. Khan, John L. Byrne, Sekwon Lee, David Emberson. 86-95 [doi]
- Classification of Three-dimensional Electron Diffraction Data with a Large Language ModelKazuyuki Yasuda, Masahito Kumagai, Masayuki Sato 0001, Kazuhiko Komatsu, Hiroaki Kobayashi. 96-103 [doi]
- Experience Deploying Containerized GenAI Services at an HPC CenterAngel M. Beltre, Jeff Ogden, Kevin Pedretti. 104-113 [doi]
- Engine-Agnostic Model Hot-Swapping for Cost-Effective LLM InferenceRadostin Stoyanov, Viktória Spisaková, Adrian Reber, Wesley Armour, Marcin Copik, Rodrigo Bruno. 114-125 [doi]
- Seamless end-to-end containerized HPC environmentsBrandon Cook 0001, Shane Canon, Adam Lavely, Daniel Margala. 126-134 [doi]
- Usability Evaluation of Cloud for HPC ApplicationsVanessa V. Sochat, Daniel Milroy, Abhik Sarkar, Aniruddha Marathe, Tapasya Patki. 135-150 [doi]
- An elastic job scheduler for HPC applications on the cloudAditya Bhosale, Kavitha Chandrasekar, Laxmikant V. Kalé, Sara Kokkila Schumacher. 151-162 [doi]
- Evaluating HPK for Running Cloud-Native Workloads on Slurm ClustersAntony Chazapis, Lefteris Vassilakis, Giannis Petsis, Manolis Marazakis, Angelos Bilas. 163-171 [doi]
- Towards Enabling Hostile Multi-tenancy in KubernetesAli Kanso, Slava Oks, Mostafa Elzeiny, Gurpreet Virdi. 172-178 [doi]
- Using Code Coverage to Assess Feature Gaps in MPI Correctness Tool Classification TestsAlexander Hück, Simon Schwitanski, Tim Jammer, Joachim Jenke, Yussur Mustafa Oraji, Christian H. Bischof. 179-187 [doi]
- Coupling Static and Dynamic MPI Correctness Tools to Optimize Accuracy and OverheadYussur Mustafa Oraji, Simon Schwitanski, Semih Burak, Christian H. Bischof, Matthias S. Müller. 188-197 [doi]
- Data Race Detection through Vibe TranslationJan Hückelheim, Vimarsh Sathia, Siyuan Brant Qian. 198-206 [doi]
- Differential Testing for Sequential to Parallel TransformationsJobayer Ahmmed, Quazi Ishtiaque Mahmud, Junhyung Shim, Liyi Li, Ali Jannesari, Myra B. Cohen. 207-216 [doi]
- Towards an Automated Workflow for Floating-Point Analysis of GPU KernelsEsteban Miguel Rangel, S. John Pennycook. 217-224 [doi]
- LLM4FP: LLM-Based Program Generation for Triggering Floating-Point Inconsistencies Across CompilersYutong Wang, Cindy Rubio-González. 225-234 [doi]
- Exploring Reduced Precision for Deep Learning Activation FunctionsEpifanio Sarinana, Christoph Lauter, Shirley Moore. 235-243 [doi]
- Extending MPI Correctness Benchmarking to the Fortran LanguageYussur Mustafa Oraji, Alexander Hück, Christian H. Bischof. 244-248 [doi]
- Design and Implementation of a Custom Hardware Accelerator for SZx Compression in ChipyardConnor Bohannon, Kazutomo Yoshii, Sheng Di, Franck Cappello, Antonino Miceli. 249-258 [doi]
- ASCRIBE-XR: Extended Reality for Visualization of Scientific ImagesRonald Pandolfi, Julian Todd, Jeffrey J. Donatelli, Daniela Ushizima. 259-268 [doi]
- Characterizing the Performance of Parallel Data-Compression Algorithms across Compilers and GPUsBrandon Alexander Burtchell, Martin Burtscher. 269-278 [doi]
- Data Management System Analysis for Distributed Computing WorkloadsKuan-Chieh Hsu, Sairam Sri Vatsavai, Ozgur O. Kilic, Sankha Dutta, Yihui Ren 0001, David K. Park, Tatiana Korchuganova, Joseph Boudreau, Tasnuva Chowdhury, Shengyu Feng, Raees Ahmad Khan, Jaehyung Kim 0001, Norbert Podhorszki, Scott Klasky, Tadashi Maeno, Paul Nilsson, Verena Ingrid Martinez Outschoorn, Frédéric Suter, Wei Yang, Yiming Yang 0002, Shinjae Yoo, Alexei Klimentov, Adolfy Hoisie. 279-289 [doi]
- Integrating Distributed SQL Query Engines with Object-Based Computational StorageJunghyun Ryu, Soon Hwang, Junhyeok Park 0002, Seonghoon Ahn, Jeoungahn Park, Jeongjin Lee, Jinna Yang, Soonyeal Yang, Jungki Noh, Qing Zheng, Woosuk Chung, Hoshik Kim, Youngjae Kim 0001. 290-299 [doi]
- On the Compressibility of Floating-Point Data in Posit and IEEE-754 RepresentationAndrew Rodriguez, Martin Burtscher. 300-306 [doi]
- Building n-Dimensional Trees for Resolution-Based Progressive CompressionBrandon Alexander Burtchell, Martin Burtscher. 307-313 [doi]
- Lightweight CNN-Based Artifact Reduction for Scientific Error-bounded Lossy CompressionZizhe Jian, Pu Jiao, Bohan Zhang, Sheng Di, Xin Liang 0001, Guanpeng Li, Huangliang Dai, Zizhong Chen, Franck Cappello. 314-323 [doi]
- Benchmarking Cutting-Edge Scientific Error-Bounded Lossy Compressors on Correlation-Based Rate-DistortionZiwei Qiu, Jinyang Liu 0003, Kai Zhao 0008, Robert Underwood, Sheng Di. 324-331 [doi]
- FZModules: A Heterogeneous Computing Framework for Customizable Scientific Data Compression PipelinesSkyler Ruiter, Jiannan Tian, Fengguang Song. 332-338 [doi]
- Compression Error Sensitivity Analysis for Different Experts in MoE Model InferenceSongkai Ma, Zhaorui Zhang, Sheng Di, Benben Liu, Xiaodong Yu 0001, Xiaoyi Lu, Dan Wang. 339-348 [doi]
- Evaluating Accuracy and Performance Tradeoffs in GPU Accelerated Single Cell RNA-seq AnalysisCory Gardner, Seyun Jeong, Oam Khatavkar, Aiden Moon, Qinglei Cao, Tae-Hyuk Ahn. 349-358 [doi]
- Hybrid GPU Programming Education with Python and C++: Preferences, Performance and Common Python PitfallsLena Oden. 359-366 [doi]
- An Interactive Agentic HPC Tutor for Lesson Planning, Teaching, and AssessmentErik Pautsch, Mengjiao Han, Joseph A. Insley, Janet Knowles, Victor A. Mateevitsi, Silvio Rizzi 0001, George K. Thiruvathukal. 367-375 [doi]
- Teaching Task-Based Parallel Programming with a Runtime Systems-Aware PerspectiveVivek Kumar. 376-383 [doi]
- GPU Programming for AI Workflow Development on AWS SageMaker: An Instructional ApproachSriram Srinivasan, Hamdan Alabsi, Rand Obeidat, Nithisha Ponnala, Azene Zenebe. 384-392 [doi]
- The Cost of Teaching Operational MLFraida Fund, Kate Keahey, Cody Hammock, Marc Richardson, Mark Powers, Michael Sherman. 393-400 [doi]
- A Model for Teaching Machine Learning, Deep Learning, and Research Computing to Domain Scientists on HPC ResourcesWilliam J. Allen, Kelsey M. Beavers, Erik S. Ferlanti, Lorenzo Concia, Joshua Urrutia, Ernesto A. B. F. Lima, John M. Fonner, Felix Zuo, H. E. Duplechin Seymour, Ari B. Kahn, Joe Stubbs, Anagha Jamthe, Stephanie N. Baker, Tabish Khan, James P. Carson. 401-408 [doi]
- Peachy Parallel Assignments (EduHPC 2025)Clara J. Almeida, Elizabeth Shoop, Diego García-Álvarez, Arturo González-Escribano, David Guerrero-Pantoja, Cameron Maloney, Maria Pantoja, Silvio Rizzi 0001, David P. Bunde. 409-415 [doi]
- Microcredentials for Open Hardware and HPC Workforce Development: The Openchip Approach with RISC-V EcosystemKateryna Bondar, Xavier Aragonès, Jordi Carrabina, Elly De Pelecijn, Antonio Miguel Espinosa, José Ignacio Gómez, Francesc Guim 0001, Tomàs Margalef, Juan Carlos Moure, Luis Piñuel, Patrick Reynaert, Ivan Rodero. 416-423 [doi]
- From Soil to Software: Experience from a STEM Workshop on Smart Plant Care and Teachable MachinesAnita Esmaeilian, Kishwar Ahmed. 424-432 [doi]
- "Offloading" Undergraduate Research to the Graphics Processing Unit for AccelerationBryant M. Wyatt, Mason Bane. 433-438 [doi]
- MPPI - Type safe C++ Datatypes for MPIMike Söhner, Christoph Niethammer. 439-448 [doi]
- Accelerating Intra-Node GPU Communication: A Performance Model for Multi-Path TransfersAmir Hossein Sojoodi, Mohammad Akbari, Hamed Sharifian, Ali Farazdaghi, Ryan E. Grant, Ahmad Afsahi. 449-460 [doi]
- Large-Message All-to-All Communication at Frontier ScaleJames Buford White. 461-467 [doi]
- MPI Collectives with Programmable Smart SwitchesThomas Erbesdobler, Amir Raoofy, Ehab Saleh, Josef Weidendorfer. 468-478 [doi]
- Scaling All-to-All Operations Across Emerging Many-Core SupercomputersShannon Gayle Kinkead, Jackson Wesley, Whit Schonbein, David Debonis, Matthew G. F. Dosanjh, Amanda Bienz. 479-488 [doi]
- On the integration of lightweight tasks with MPI using the C++26 std: : execution 'Senders' APIJohn Biddiscombe, Mikael Simberg, Auriane Reverdell, Raffaele Solcà, Alberto Invernizzi, Rocco Meli, Joseph Schuchart. 489-499 [doi]
- Enabling Unstructured Sparse Fine-Tuning and Inference for Foundation Models on Wafer-Scale EngineHaoyu Zheng, Yifan Zeng, Linghao Song, Murali Emani, Wenqian Dong. 500-507 [doi]
- WAGES: Workload-Aware GPU Sharing System for Energy-Efficient Serverless LLM ServingTianyu Wang, Gourav Rattihalli, Aditya Dhakal, Xulong Tang, Dejan S. Milojicic. 508-515 [doi]
- OmniFed: A Modular Framework for Configurable Federated Learning from Edge to HPCSahil Tyagi, Andrei Cozma, Olivera Kotevska, Feiyi Wang. 516-523 [doi]
- Enhancing ChatPORT with CUDA-to-SYCL Kernel Translation CapabilityZheming Jin, Swaroop Pophale, Keita Teranishi. 524-533 [doi]
- Evaluation of Test-Time Compute Constraints on Safety and Skill Large Reasoning ModelsAdarsha Balaji, Le Chen, Rajeev Thakur, Franck Cappello, Sandeep Madireddy. 534-539 [doi]
- Batch Tiling on Attention: Efficient Mixture of Experts Training on Wafer-Scale ProcessorsDaria Soboleva, Étienne Goffinet, Hui Zeng, Sangamesh Ragate, Elif Albuz, Natalia Vassilieva. 540-544 [doi]
- Automated MCQA Benchmarking at Scale: Evaluating Reasoning Traces as Retrieval Sources for Domain Adaptation of Small Language ModelsOzan Gökdemir, Neil Getty, Robert Underwood, Sandeep Madireddy, Franck Cappello, Arvind Ramanathan, Ian T. Foster, Rick L. Stevens. 545-552 [doi]
- Agentic AI vs ML-based Autotuning: A Comparative Study for Loop Reordering OptimizationMiguel Romero Rosas, Rudolf Eigenmann, Khaled Ibrahim. 553-559 [doi]
- GridMind: LLMs-Powered Agents for Power System Analysis and OperationsHongwei Jin, Kibaek Kim, Jonghwan Kwon. 560-568 [doi]
- Frameworks for Large Language Model Serving in HPC EnvironmentsRohan Marwaha, Qinren Zhou, Kastan Day, Asmita Dabholkar, Volodymyr V. Kindratenko. 569-574 [doi]
- Exploring Distributed Vector Databases Performance on HPC Platforms: A Study with QdrantSeth Ockerman, Amal Gueroudji, Song Young Oh, Robert Underwood, Nicholas Chia, Kyle Chard, Robert B. Ross, Shivaram Venkataraman. 575-581 [doi]
- EQSIM Agent: A Conversational AI for Interactive Exploration of Large-scale Earthquake Simulation DataHoujun Tang, David McCallen. 582-587 [doi]
- Beyond End-to-End: Understanding the Limits of LLMs in Scientific Problem SolvingYouyuan Liu, Sheng Di, Neil Getty, Tanwi Mallick, Robert Underwood, Sian Jin. 588-593 [doi]
- BioR5: A Three-Layer Architecture for Biological Reasoning in Scientific AIPeng Ding, Thomas S. Brettin, Rick Stevens. 594-601 [doi]
- ChatEED: An agentic retrieval assistant for accelerator operatorsAaron Zachary Reed, Claudio Bisegni, Sandesh Shrestha, Michelle Huang, Daniel Ratner. 602-606 [doi]
- LABMATE: Language Model Based Multi-Agent System to Accelerate Catalysis ExperimentsAnurag Acharya 0002, Anshu Kiran Sharma, Derek Parker, Timothy Vega, Rizwan A. Ashraf, Natalie M. Isenberg, Jan Strube 0001, Robert Rallo. 607-615 [doi]
- Programmer productivity and performance on AMD's AI Engines: Offloading Fortran intrinsics via MLIR a case-studyNick Brown 0002, Gabriel Rodriguez-Canal. 616-623 [doi]
- A Compute Graph Simulation and Implementation Framework Targeting AMD Versal AI EnginesJonathan Strobl, Leonardo Solis-Vasquez, Yannick Lavan, Andreas Koch 0001. 624-632 [doi]
- SNAcc: An Open-Source Framework for Streaming-based Network-to-Storage AcceleratorsDavid Volz, Torben Kalkhof, Andreas Koch 0001. 633-641 [doi]
- Connected-Component Labeling Using HLS for High-Energy Particle Physics InstrumentsNick Song, Marion Sudvarg, Roger D. Chamberlain. 642-650 [doi]
- Aurora Acceptance: A Collaborative Exascale Test HarnessBrian Homerding, Longfei Gao, Ben Lenard, Eric Pershey, Kevin Harms, Doug Waldron, Susan Coghlan, Bill Allcock, Ti Leggett, Balazs Gerofi, Tom Musta. 651-661 [doi]
- Experiences Integrating Database Support into the OLCF Test HarnessNick Hagerty, Daniel Dietz. 662-668 [doi]
- Testing and Benchmarking Emerging Supercomputers via the MFC Flow SolverBenjamin Wilfong, Anand Radhakrishnan, Henry Le Berre, Tanush Prathi, Stephen Abbott, Spencer H. Bryngelson. 669-677 [doi]
- Seeking Cost-Optimal Infrastructure Size for Distributed Filesystems: A Ceph Case StudyNiccolò Tosato, Isac Pasianotto, Ruggero Lot, Stefano Cozzini. 678-687 [doi]
- A Modular, Responsive, and Accessible HPC Dashboard Built upon Open OnDemandRichie Tan, Guangzhen Jin. 688-696 [doi]
- Open Composer: A Web-Based Application for Generating and Managing Batch Jobs on HPC ClustersMasahiro Nakao, Keiji Yamamoto. 697-704 [doi]
- Is it an HPC Workflow Assistant? Is it a Framework? It's Drona Workflow EngineAndrii Kryvenko, Duy Pham, Marinus Pennings, Honggao Liu. 705-714 [doi]
- Generating Frequently Asked Questions from Technical Support Tickets using Large Language ModelsChristina Joslin, David Burns, Fnu Ashish, Elham Sarbijan. 715-726 [doi]
- AskHPC: A ChatBot for High Performance Computing User SupportAkhilesh Bondapalli, Huihuo Zheng, Oluwaseun T. Ajayi, Murat Keçeli, Haritha Siddabathuni Som, J. Taylor Childers, Lisa Childers, Yasaman Ghadar, Michael E. Papka, Venkatram Vishwanath, Rong Ge 0002. 727-739 [doi]
- ModuLair: Streamlining Python Virtual Environment Management for HPCSurada Suwansathit, Ananya Adiki, Gabriel Floreslovo, Marinus Pennings, Honggao Liu. 740-749 [doi]
- Dori: User Centered HPC for Data Intensive ComputingGeorg Rath. 750-756 [doi]
- eIM: GPU-Accelerated Efficient Influence Maximization in Large-Scale Social NetworksJacob Doney, Xin Huang, Chul-Ho Lee. 757-765 [doi]
- Generating Permutations at ScaleOded Green, Joe Eaton, Alok Tripathy, Corey Nolet, Justin Luitjens. 766-774 [doi]
- An Optimized Generalized Multi-Color Point Implicit Solver for Intel GPUs using OneAPI ESIMDJoseph Wassell, Mohammad Zubair, Aaron Walden, Gabriel Nastac, Eric J. Nielsen, Timothée Ewart. 775-783 [doi]
- Profiling Application-Specific Properties of Irregular Graph Algorithms on GPUsBishal Sharma, Martin Burtscher. 784-792 [doi]
- Architecting Tensor Core-Based Reductions for Irregular Molecular Docking KernelsLeonardo Solis-Vasquez, Andreas F. Tillack, Diogo Santos-Martins, Andreas Koch 0001, Stefano Forli. 793-803 [doi]
- Performance-Portable Symbolic Factorization through Common Graph OperationsOguz Selvitopi, Xiaoye S. Li, Aydin Buluç. 804-812 [doi]
- Comparing Graph Algorithm Styles on NVIDIA and AMD GPUsAvery Vanausdal, Martin Burtscher. 813-817 [doi]
- Benchmarking the Cerebras Wafer Scale Engine-2 ArchitectureTakaaki Miyajima, Leon Fukuoka. 818-822 [doi]
- How effective is matrix reordering for improving performance of sparse matrix-vector multiplication?Omid Asudeh, Sina Mahdipour Saravani, Fabrice Rastello, Gerald Sabin, Ponnuswamy Sadayappan. 823-827 [doi]
- Rapid Quantum Network Simulation Design with a Path to Scalable ExecutionAaron Welch, Joel Dawson, Mariam Kiran. 828-833 [doi]
- To Stream or Not to Stream: Towards A Quantitative Model for Remote HPC Processing DecisionsFlavio Castro, Weijian Zheng, Joaquin Chung 0001, Ian T. Foster, Raj Kettimuthu. 834-840 [doi]
- From Path-Aware to Application-Aware Source Routing using Traffic ClassesShashwitha Puttaswamy, Mariam Kiran. 841-847 [doi]
- Learning to Schedule: A Supervised Learning Framework for Network-Aware Scheduling of Data-Intensive WorkloadsSankalpa Timilsina, Susmit Shannigrahi. 848-853 [doi]
- LLM-based Optimization Algorithm Selection for High-Performance Networks OrchestrationAnestis Dalgkitsis, Cyril Shih-Huan Hsu, Chrysa Papagianni, Paola Grosso, Cees de Laat. 854-859 [doi]
- Optimizing Network Resilience Using Domain-Specific Hardware Accelerator for Dynamic ProgrammingAli Mazloum, Sergio Elizalde, Samia Choueiri, Elie F. Kfoury, Jose Gomez, Ali AlSabeh, Jorge Crichigno. 860-865 [doi]
- Complex Parsing for In-Network Acceleration of High-Energy Physics ExperimentsBjoern Sagstad, Nishanth Shyamkumar, Sophia Chen, Wesley Robert Ketchum, Roland Sipos, James B. Kowalkowski, Michael Wang 0003, Nik Sultana. 866-874 [doi]
- SENSE in Practice: Quantifying the End-to-End Benefits of Intent-Based Bandwidth Reservation for Exascale Science WorkflowsInder Monga, Mazahir Hussain, Justas Balcas, Aashay Arora, Diego Davila, Buseung Cho, Xi Yang 0001, Cees de Laat. 875-885 [doi]
- Scaling LLM Training Using RDMA over Converged EthernetAlex Batlle Casellas, Adrián Pérez Diéguez, Aleix Torres-Camps, Harris Teague, Arnau Padrés Masdemont, Jordi Ros-Giralt. 886-896 [doi]
- Network Replay and Consistency Across TestbedsAlexander Wolosewicz, Vinod Yegneswaran, Ashish Gehani, Nik Sultana. 897-907 [doi]
- eCounter: Inline Per-IP Network Monitoring at Millisecond Resolution via eBPFXinxin Mei, Jie Chen, Amitoj Singh, Ilya Baldin, David Lawrence. 908-918 [doi]
- Implementing Network-level QoS at HPC Datacenters to Enable Distributed Scientific WorkflowsAnna Giannakou, Jonathan Skone, Vinay Sawal, Ronal Kumar, Stephen Simms, Nicholas J. Wright, Lavanya Ramakrishnan. 919-928 [doi]
- StreamHub: High-performance Managed SciStream as a ServiceSeena Vazifedunn, Flavio Castro, Rajkumar Kettimuthu, Ian T. Foster, Kyle Chard. 929-938 [doi]
- Modular Architecture for High-Performance and Low Overhead Data TransfersRasman Mubtasim Swargo, Engin Arslan, Md. Arifuzzaman. 939-948 [doi]
- From Edge to HPC: Investigating Cross-Facility Data Streaming ArchitecturesAnjus George, Michael J. Brim, Christopher Zimmer 0001, David M. Rogers 0001, Sarp Oral, Zach Mayes. 949-959 [doi]
- SmartNIC Data Exchange FrameworkZackary Savoie, Anthony Sicoie, Ryan Eric Grant. 960-967 [doi]
- Error Analysis of Globally Distributed Workflow Management SystemSankha Dutta, Ozgur O. Kilic, Tatiana Korchuganova, Paul Nilsson, Sairam Sri Vatsavai, Kuan-Chieh Hsu, David K. Park, Joseph Boudreau, Tasnuva Chowdhury, Shengyu Feng, Raees Khan, Jaehyung Kim 0001, Scott Klasky, Tadashi Maeno, Verena Ingrid Martinez Outschoorn, Norbert Podhorszki, Yihui Ren 0001, Frédéric Suter, Wei Yang, Yiming Yang 0002, Shinjae Yoo, Alexei Klimentov, Adolfy Hoisie. 968-976 [doi]
- MPI Communication Performance on AMD MI300A: Microbenchmarks and ApplicationsGoutham Kalikrishna Reddy Kuncham, Siyuan Zhang, Shoaib Mohammad, Chen-Chun Chen, Dhabaleswar K. Panda 0001. 977-984 [doi]
- In-Transit Data Transport Strategies for Coupled AI-Simulation Workflow PatternsHarikrishna Tummalapalli, Riccardo Balin, Christine M. Simpson, Andrew Park, Aymen Alsaadi, Andrew E. Shao, Wesley Brewer, Shantenu Jha. 985-996 [doi]
- From Exploration to Explanation: ML-Driven Causal Discovery for Datacenter Reliability at ScalePavana Prakash, Rolando P. Hong Enriquez, Sergey Serebryakov, David Grant, Wesley Brewer, Dejan S. Milojicic. 997-1002 [doi]
- A Data-Size Adaptive Approach to I/O of Poorly Load Balanced In Situ Data ExtractsCaitlin Ross, Scott Wittenburg, Greg Eisenhauer, Corey Wetterer-Nelson. 1003-1008 [doi]
- Lessons Learned: Template-Heavy C++ in Production HPC Runtime SystemsArne Hendricks. 1009-1016 [doi]
- ASaP: Automatic Software Prefetching for Sparse Tensor Computations in MLIRKonstantinos Sotiropoulos, Jonas Skeppstedt, Per Stenström. 1017-1027 [doi]
- Implementing OpenMP Offload Support in the AMD Next Generation Fortran CompilerDominik Adamski, Sergio Afonso, Akash Banerjee, Pranav Bhandarkar, Kareem Ergawy, Andrew Gozillon, Michael Klemm, Jan Leyonberg, Dan Palermo. 1028-1038 [doi]
- CIRE: LLVM Analysis for Floating-Point Rounding Error Affected by Precision and OptimizationsTanmay Tirpankar, Cayden Lund, Ganesh Gopalakrishnan. 1039-1047 [doi]
- Lowering and Runtime Support for Fortran's Multi-Image Parallel Features using LLVM Flang, PRIF, and CaffeineDan Bonachea, Katherine Rasmussen, Damian W. I. Rouson, Jean-Didier Pailleux, Etienne Renault, Brad Richardson. 1048-1056 [doi]
- Scabbard: LLVM Instrumentation-aided Race Checking in CPU/GPU Unified Memory for AMD GPUsAndrew Osterhout, Ignacio Laguna, Ganesh Gopalakrishnan. 1057-1065 [doi]
- Dynamic Thread Coarsening for CPU and GPU OpenMP CodeIvan R. Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert. 1066-1074 [doi]
- OpenSHMEM MLIR: A Dialect for Compile-Time Optimization of One-Sided CommunicationsMichael Beebe, Benjamin Michalowicz, Andrew McNamara, Yash Kumar, Dhabaleswar K. Panda 0001, Yong Chen 0001, Wendy Poole, Steve Poole 0001. 1075-1087 [doi]
- Distributed Sparse Tensor Computations in MLIRMiheer Vaidya, Shreya Singh, Devanshu Mantri, Michael Shannon Eydenberg, Brian Michael Kelley, Sivasankaran Rajamanickam, Atanas Rountev, P. Sadayappan. 1088-1095 [doi]
- An MLIR pipeline for offloading Fortran to FPGAs via OpenMPGabriel Rodriguez-Canal, David Katz, Nick Brown 0002. 1096-1103 [doi]
- Umpire: Portable Memory Management for High-Performance Computing ApplicationsKristi Belcher, David Beckingsale. 1104-1109 [doi]
- The MALL is Open: Exploring Shared Caches and Latency in AMD CDNA™ 3 GPUsAndrew Tee, Nicholas Curtis, Noah Wolfe, Daniel Wong. 1110-1116 [doi]
- CaRDS: Compiler-aided Remote Data StructuresBrian R. Tauro, Ian Dougherty, Kyle C. Hale. 1117-1125 [doi]
- Hardware-Software Co-Design of Iterative Filter-Update Numerical Methods Using Processing-In-MemoryEric Tang, Tianyun Zhang, William Bradford, Farzana Ahmed Siddique, James C. Hoe, Kevin Skadron, Franz Franchetti. 1126-1132 [doi]
- Mixed-Precision Performance Portability of FFT-Based GPU-Accelerated Algorithms for Block-Triangular Toeplitz MatricesSreeram Venkat, Kasia Swirydowicz, Noah Wolfe, Omar Ghattas. 1133-1146 [doi]
- Performance portable batched linear algebra kernels for transport sweeps using KokkosGabriel Suau, Thierry Gautier, Ansar Calloo, Rémi Baron, Romain Le Tellier. 1147-1158 [doi]
- Extending RAJA Parallel Programming Abstractions with Just-In-Time OptimizationJohn Bowen, Konstantinos Parasyris, David Beckingsale, Tal Ben-Nun, Thomas Stitt, Giorgis Georgakoudis. 1159-1171 [doi]
- Bridging Performance Portability and Scalability for Plasma Simulations on Heterogeneous SystemsNigel Phillip Tan, Scott V. Luedtke, Michela Taufer, Brian J. Albright. 1172-1182 [doi]
- Energy-aware performance portability with OpenMP dynamic variantsAyman Bourramouss el Maach, Adrian Munera, Sara Royuela. 1183-1195 [doi]
- Preserving CUDA Syntax for SYCL Portability: A Thin C++ Abstraction without Kernel MigrationEsteban Miguel Rangel, Humza Qureshi. 1196-1204 [doi]
- Roofline Analysis of Tightly-Coupled CPU-GPU Superchips: A Study on MI300A and GH200Oscar Antepara, Leonid Oliker, Samuel Williams 0001. 1205-1216 [doi]
- LAMMPS-KOKKOS: Performance Portable Molecular Dynamics Across Exascale ArchitecturesAnders Johansson, Evan Weinberg, Christian Trott, Megan J. McCarthy, Stan G. Moore. 1217-1232 [doi]
- Development of a performance portable distributed FFT interface on top of the Kokkos ecosystemYuuichi Asahi, Trévis Morvany, Thomas Padioleau, Julien Bigot. 1233-1242 [doi]
- KVMSR+UDWeave: Extreme-Scaling with Fine-grained Parallelism on the UpDown Graph SupercomputerAlexander Fell, Yuqing Wang, Tianshuo Su, Marziyeh Nourian, Wenyi Wang, Jose M. Monsalve Diaz, Andronicus Samsundar Rajasukumar, Jiya Su, Ruiqi Xu 0001, Rajat Khandelwal, Tianchi Zhang 0005, David F. Gleich, Yanjing Li, Hank Hoffmann, Andrew A. Chien. 1243-1262 [doi]
- Comparing Distributed-Memory Programming Frameworks with Radix SortMichael P. Ferguson, Ryan D. Friese, Shreyas Khandekar, Matt Drozt. 1263-1275 [doi]
- Slicing Is All You Need: Towards A Universal One-Sided Algorithm for Distributed Matrix MultiplicationBenjamin Brock, Renato Golin. 1276-1288 [doi]
- DiOMP-Offloading: Toward Portable Distributed Heterogeneous OpenMPBaodi Shan, Mauricio Araya-Polo, Barbara M. Chapman. 1289-1301 [doi]
- Weak Scaling of NVSHMEM Applied To Hashed Distributed Structured DataAndrew Davis, Hans Johansen, Xinfeng Gao, Stephen M. Guzik. 1302-1313 [doi]
- Redesigning GROMACS Halo Exchange: Improving Strong Scaling with GPU-initiated NVSHMEMMahesh Doijade, Andrey Alekseenko, Ania Brown, Alan Gray, Szilárd Páll. 1314-1329 [doi]
- Enhancing HPX with FleCSI: Automatic Detection of Implicit Task DependenciesS. Davis Herring 0001, Maxim Moraru, Scott Pakin, Julien Loiseau, Richard Berger, Philipp V. F. Edelmann, Ben Bergen 0002. 1330-1340 [doi]
- Stackless vs. Stackful Coroutines: A Comparative Study for RDMA-based Asynchronous Many-Task (AMT) RuntimesMia Reitz, Jonas Posner. 1341-1350 [doi]
- KDRSolvers: Scalable, Flexible, Task-Oriented Krylov SolversDavid Kai Zhang, Rohan Yadav, Alex Aiken, Fredrik Kjolstad, Sean Treichler. 1351-1365 [doi]
- LLMTailor: A Layer-wise Tailoring Tool for Efficient Checkpointing of Large Language ModelsMinqiu Sun, Xin Huang, Luanzheng Guo, Nathan R. Tallent, Kento Sato, Dong Dai 0001. 1366-1374 [doi]
- SlimIO: Lightweight I/O Path Design for Write Isolation in FDP-backed In-Memory DatabasesSangyun Lee, Sungjin Byeon, Soon Hwang, Jaewan Park, Jooyoung Hwang, Junyoung Han, Javier González, Awais Khan 0002, Youngjae Kim. 1375-1384 [doi]
- Parallel Data Object Creation: Scalable Metadata Management in Parallel I/O LibraryYoujia Li, Robert Latham, Robert B. Ross, Ankit Agrawal 0001, Alok N. Choudhary, Wei-keng Liao. 1385-1395 [doi]
- SmartIO: A Lightweight End-to-End Workflow for Runtime I/O Optimization of HPC SystemsHammad Bin Ather, Chen Wang 0004, Hariharan Devarajan, Hank Childs, Kathryn M. Mohror. 1396-1405 [doi]
- RL4Sys: A Lightweight System-driven RL Framework for Drop-in Integration in System OptimizationJiaxin Dong, Md. Hasanur Rashid, Helen Xu 0001, Dong Dai 0001. 1406-1414 [doi]
- Quantifying AWS S3 I/O Performance Boundaries Using the Roofline ModelMeng Tang, Zhaobin Zhu, Luanzheng Guo, James G. Bandy, Tim Carlson, Sarah Neuwirth, Anthony Kougkas, Xian-He Sun, Nathan R. Tallent. 1415-1423 [doi]
- Secure In-Storage Execution of VTK Workloads on Modern Parallel NFS Data ServersQing Zheng, Brian Atkinson, Jason Lee 0004, Daoce Wang, Gary Grider. 1424-1432 [doi]
- Modelling Load Imbalance In Shared Memory Multicore SystemsJohannes Langguth, James D. Trotter, Xing Cai. 1433-1441 [doi]
- A Peak Performance Model for All-to-all on Hierarchical Systems and Its ApplicationsRohini Uma-Vaideswaran, Joshua Romero, Daniel L. Dotson, David Appelhans, P.-K. Yeung. 1442-1451 [doi]
- Determining Levels of Detail for Simulators of Parallel and Distributed Computing Systems via Automated CalibrationJesse McDonald, Yick Ching Wong, Kshitij Mehta, Frédéric Suter, Rafael Ferreira da Silva, Loic Pottier, Ewa Deelman, Henri Casanova. 1452-1463 [doi]
- Beyond Guess and Check: Quantifying the Fidelity of Proxy ApplicationsSi Chen, Simon Garcia De Gonzalo, Omar Aaziz, Jeanine E. Cook, Avani Wildani. 1464-1477 [doi]
- CGSim: A Simulation Framework for Large Scale Distributed Computing EnvironmentSairam Sri Vatsavai, Raees Khan Ahmed, Kuan-Chieh Hsu, Ozgur O. Kilic, Yihui Ren 0001, David K. Park, Paul Nilsson, Tania Korchuganova, Sankha Dutta, Joseph Boudreau, Tasnuva Chowdhury, Shengyu Feng, Fatih Furkan Akman, Adolfy Hoisie, Scott Klasky, Tadashi Maeno, Verena Ingrid Martinez Outschoorn, Norbert Podhorszki, Frédéric Suter, John Rembrandt Steele, Wei Yang, Yiming Yang 0002, Shinjae Yoo, Alexei Klimentov. 1478-1483 [doi]
- Implications of Full-System Modeling for Superconducting ArchitecturesKunal Pai, Mahyar Samani, Anusheel Nand, Jason Lowe-Power. 1484-1490 [doi]
- PerfAnalyzer: Revealing Performance Trends using Version Oriented Visual Analysis of Scientific SoftwareSayef Azad Sakin, James P. Ahrens. 1491-1495 [doi]
- Experiences of Porting Structured and Unstructured Stencil Applications to FPGA using SYCLZadok Storkey, Steven Wright, Ian Gray. 1496-1501 [doi]
- MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision ModelsKrishna Teja Chitty-Venkata, Sylvia Howland, Golara Azar, Daria Soboleva, Natalia Vassilieva, Siddhisanket Raskar, Murali Emani, Venkatram Vishwanath. 1502-1511 [doi]
- Pretraining LLMs at Scale: Tuning Strategies and Performance PortabilityAdrián Pérez Diéguez, Alex Batlle Casellas, Aleix Torres-Camps, Harris Teague, Jordi Ros-Giralt. 1512-1523 [doi]
- Characterizing the Impact of GPU Power Management on an Exascale SystemMariana Toledo Costa, Antigoni Georgiadou, James B. White, Bruno Villasenor Alvarez, Jordà Polo, Woong Shin, Philippe Olivier Alexandre Navaux, Bronson Messer, Arthur Francisco Lorenzon. 1524-1533 [doi]
- A GPU FFT Wrapper to Co-optimize Floating-Point Precision and Library Selection via Predictive Error ModelingJulius Lehner, Eishi Arima, Martin Schulz 0001. 1534-1543 [doi]
- ILAN: The Interference- and Locality-Aware NUMA SchedulerEdvin Mellberg, Axel Carlsson, Jing Chen, Miquel Pericàs. 1544-1553 [doi]
- On the Performance and Scalability of Cloud Supercomputers: Insights from Eagle and ReindeerAmirreza Rastegari, Prabhat Ram, Michael F. Ringenburg. 1554-1563 [doi]
- A Cache Interaction Graph for Data Locality OptimizationChao Jin 0001, David Abramson 0001, Mo'ath Qutaish, Mark Endrei, Bronis R. de Supinski. 1564-1573 [doi]
- MT4G: A Tool for Reliable Auto-Discovery of NVIDIA and AMD GPU Compute and Memory TopologiesStepan Vanecek, Manuel Walter Mußbacher, Dominik Größler, Urvij Saroliya, Martin Schulz 0001. 1574-1586 [doi]
- Fantastic Hardware Counters and How to Find Them: Automating the Detection of Noise-Resilient Performance Counters in HPCAhmad Tarraf, Alexander Geiß, Lukas Fuchs, Felix Wolf 0001. 1587-1600 [doi]
- Extending THAPI with CXI Hardware Counter Sampling for High Resolution NIC TelemetryNathan Nichols, Thomas Applencourt. 1601-1610 [doi]
- Scalable, High-Fidelity Monitoring of Application Communication Patterns in VernierJered Dominguez-Trujillo, Derek Schafer, Riley Shipley, Ryan J. Marshall, Nicholas H. Bacon, Maxim Moraru, Galen M. Shipman, Anthony Skjellum, Patrick G. Bridges. 1611-1623 [doi]
- PEAK: Cost-Adaptive Profiling in a HeartbeatYuheng Chen, Junjie Li 0003, Chun-Yaung Lu, Yinzhi Wang. 1624-1633 [doi]
- Fast on-demand Memory Mapping for Shared Memory and Disaggregated SystemsYuang Yan, Ryan E. Grant. 1634-1642 [doi]
- TEGRA - Scaling Up Graph Processing with Disaggregated ComputingWilliam Shaddix, Mahyar Samani, Jason Lowe-Power, Venkatesh Akella. 1643-1649 [doi]
- An RDMA-First Object Storage System with SmartNIC OffloadYu Zhu, Aditya Dhakal, Pedro Bruel, Gourav Rattihalli, Yunming Xiao, Johann Lombardi, Dejan S. Milojicic. 1650-1657 [doi]
- DoCeph: DPU-Offloaded Messaging in Ceph for Reduced Host CPU UtilizationKyuli Park, Sungmin Yoon, Farid Talibli, Sungyong Park, Jae-Hyuck Kwak, Kimoon Jeong, Awais Khan 0002, Youngjae Kim 0001. 1658-1666 [doi]
- IzhiRISC-V - a RISC-V-based Processor with Custom ISA Extension for Spiking Neuron Networks Processing with Izhikevich NeuronsWiktor Jan Szczerek, Artur Podobas. 1667-1675 [doi]
- RISC-V Vectorization Coverage for HPC: A TSVC-Based AnalysisHung-Ming Lai, Pei-Hung Lin, Maya B. Gokhale, Ivy Peng, Hiren D. Patel, Jenq Kuen Lee. 1676-1683 [doi]
- A RISC-V Vector Extension for Multi-word ArithmeticYunhao Lan, Larry Tang 0003, Naifeng Zhang, Youngjin Eum, James C. Hoe, Franz Franchetti. 1684-1693 [doi]
- Enabling the syscall_intercept library for RISC-VPetar Andric, Ramon Nou, Aaron Call, Guillem Senabre. 1694-1702 [doi]
- Is RISC-V ready for High Performance Computing? An evaluation of the Sophon SG2044Nick Brown 0002. 1703-1711 [doi]
- Bridging Simulation and Silicon: A Study of RISC-V Hardware and FireSim SimulationAtanu Barai, Kamalavasan Kamalakkannan, Patrick Diehl, Maxim Moraru, Jered Dominguez-Trujillo, Howard Pritchard, Nandakishore Santhi, Farzad Fatollahi-Fard, Galen M. Shipman. 1712-1722 [doi]
- Simulating Hybrid Analog + RISC-V Systems for HPC ApplicationsCameron Durbin, Jacob Flores, Thomas Weatherly, Ben Feinberg. 1723-1728 [doi]
- Accelerating Gravitational N-Body Simulations Using the RISC-V-Based Tenstorrent WormholeJenny Lynn Almerol, Elisabetta Boella, Mario Spera, Daniele Gregori. 1729-1735 [doi]
- Assessing a RISC-V Accelerator for Cross-Section Lookup in ChipyardAndrew Ledbetter, Kazutomo Yoshii, John R. Tramm. 1736-1742 [doi]
- Dyninst on the RISC-V: Binary Instrumentation in Support of Performance, Debugging, and Other ToolsCheng-Hsun Angus He, Ronak Chauhan, James A. Kupsch, Hsuan-Heng Wu, Barton P. Miller. 1743-1750 [doi]
- Extending the C++ Execution Control Library to Support Dynamic Parallel Runtime SystemsIan Henriksen, Jan Ciesko, Stephen L. Olivier. 1751-1761 [doi]
- Assessing Page Reclamation Mechanisms for LinuxShaochang Liu, Jie Ren. 1762-1769 [doi]
- Reproducible Performance Evaluation of OpenMP and SYCL Workloads under Noise InjectionChristoffer Persson, Mathias Pretot, Minyu Cui, Miquel Pericàs. 1770-1778 [doi]
- Numerical Properties and Scalability of s-Step Preconditioned Conjugate Gradient MethodsViktoria Mayer, Wilfried N. Gansterer. 1779-1789 [doi]
- Efficient Embedding Initialization via Dominant Eigenvector ProjectionsQuentin R. Petit, Chong Li 0003, Nahid Emad, Jack J. Dongarra. 1790-1799 [doi]
- Fast Linear Solvers via AI-Tuned Markov Chain Monte Carlo-based Matrix InversionAnton Lebedev, Won Kyung Lee, Soumyadip Ghosh, Olha I. Yaman, Vassilis Kalantzis, Yingdong Lu, Tomasz Nowicki, Shashanka Ubaru, Lior Horesh, Vassil Alexandrov 0001. 1800-1807 [doi]
- A High Performance GPU CountSketch Implementation and Its Application to Multisketching and Least Squares ProblemsAndrew J. Higgins 0002, Erik G. Boman, Ichitaro Yamazaki. 1808-1815 [doi]
- Post-Variational Quantum Neural Networks on a Hybrid HPC-QC SystemMaxence Vandromme, Miwako Tsuji. 1816-1823 [doi]
- High-Performance and Power-Efficient Emulation of Matrix Multiplication using INT8 Matrix EnginesYuki Uchino, Katsuhisa Ozaki, Toshiyuki Imamura. 1824-1831 [doi]
- Scalable Hydrodynamics on multiple Field-Programmable Gate Arrays (FPGAs)François-Xavier Mordant, Charles Prouveur, Pascal Tremblin, Nicolas Gac. 1832-1841 [doi]
- First Practical Experiences Integrating Quantum Computers with HPC Resources: A Case Study With a 20-qubit Superconducting Quantum ComputerEric Mansfield, Stefan Seegerer, Panu T. Vesanen, Jorge Echavarria, Muhammad Nufail Farooqi, Burak Mete, Laura Brandon Schulz. 1842-1850 [doi]
- A Simulation Framework for Workload Management in Hybrid Quantum-HPC Cloud SystemWaylon Luo, Cheng-Chang Lu, Tong Zhan, Qiang Guan. 1851-1859 [doi]
- An HPC-Inspired Blueprint for a Technology-Agnostic Quantum Middle LayerStefano Markidis, Gilbert Netzer, Luca Pennati, Ivy Peng. 1860-1867 [doi]
- Tackling the Challenges of Adding Pulse-level Support to a Heterogeneous HPCQC Software Stack: MQSS PulseJorge Echavarria, Muhammad Nufail Farooqi, Amit Devra, Santana Lujan, Léo Van Damme, Hossam Ahmed, Martín Letras, Ercüment Kaya, Adrian Vetter, Max Werninghaus, Martin Knudsen, Felix Rohde, Albert Frisch, Eric Mansfield, Rakhim Davletkaliyev, Vladimir Kukushkin, Noora Färkkilä, Janne Mäntylä, Nikolas Pomplun, Andreas Spörl, Lukas Burgholzer, Yannick Stade, Robert Wille, Laura Brandon Schulz, Martin Schulz 0001. 1868-1878 [doi]
- Towards a user-centric HPC-QC environmentAleksander Wennersteen, Matthieu Moreau, Aurelien Nober, Mourad Beji. 1879-1887 [doi]
- Scaling Hybrid Quantum-HPC Applications with the Quantum FrameworkSrikar Chundury, Amir Shehata, Seongmin Kim, Muralikrishnan Gopalakrishnan Meena, Chao Lu, Kalyana C. Gottiparthi, Eduardo Antonio Coello Pérez, Frank Mueller 0001, In-Saeng Suh. 1888-1897 [doi]
- Orchestrating Quantum-HPC Workflows with Distributed Quantum Circuit CuttingMar Tejedor, Berta Casas, Javier Conejero, Alba Cervera-Lierta, Rosa M. Badia. 1898-1906 [doi]
- Towards Supporting QIR: Steps for Adopting the Quantum Intermediate RepresentationYannick Stade, Lukas Burgholzer, Robert Wille. 1907-1915 [doi]
- A Practical Quantum Solver for Multidimensional Partial Differential EquationsManu Chaudhary, Kareem El-Araby, Md. Alvir Islam Nobel, S. M. Ishraq Ul Islam, Manish Singh, Sunday Ogundele, Kieran F. Egan, Sneha Thomas, Vincent Vordtriede, Devon Bontrager, Serom Kim, Esam El-Araby. 1916-1925 [doi]
- Securing HDF5 Plugins with Digital SignaturesGlenn Song, Michael Scot Breitenfeld, Suren Byna. 1926-1933 [doi]
- CASSE: Targeted Threat Modeling for Data Management LibrariesKeegan Sanchez, Suren Byna, Zhiqiang Lin 0001, David Mattson. 1934-1942 [doi]
- Dynamic Factor Graphs for Attack PreemptionPhuong M. Cao, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer. 1943-1950 [doi]
- Evaluating Trusted Execution Environment Performance for Genome Sequence Alignment: An AMD SEV Case StudyRobert Keßler, Lech Nieroda, Simon Volpert, Moritz Gräf, Viktor Achter, Laslo Hunhold, Stefan Wesner. 1951-1958 [doi]
- HPC Digital Twins for Evaluating Scheduling Policies, Incentive Structures and their Impact on Power and CoolingMatthias Maiterth, Wesley H. Brewer, Jaya S. Kuruvella, Arunavo Dey, Tanzima Z. Islam, Rashadul Kabir, Kevin Menear, Dmitry Duplyakin, Tapasya Patki, Terry R. Jones, Feiyi Wang. 1959-1969 [doi]
- Run-time Energy-Efficiency Optimization for AI and HPC WorkloadsGabriel Hautreux, Abdoulaye Gamatié, Gilles Sassatelli. 1970-1979 [doi]
- Improving Supercomputer Usage with Aging AwarenessRobin Boëzennec, Fanny Dufossé, Guillaume Pallez, Alix Tremodeux. 1980-1989 [doi]
- Optimizing Microgrid Composition for Sustainable Data CentersJulius Irion, Philipp Wiesner, Jonathan Bader, Odej Kao. 1990-1996 [doi]
- Energy-Aware HPC Scheduling with LLM-Based Power PredictionKevin Menear, Alex Wilkinson, Tim Dykes, Utz-Uwe Haus, Dmitry Duplyakin. 1997-2006 [doi]
- Bridging the Gap: User-Centric Energy Monitoring for Policy-Driven Application Optimization in HPC Data CentersWoong Shin, Karl W. Schulz, Arthur Francisco Lorenzon, Matthias Maiterth, Bruno Villasenor Alvarez, Jordà Polo, Aditya Kashi, Hao Lu, Nicholson Koukpaizan, Antigoni Georgiadou, Matthew R. Norman, Wael R. Elwasif, Michael A. Matheson, Feiyi Wang, Nicholas Frontiere, Sarp Oral, Thomas Beck, Bronson Messer. 2007-2016 [doi]
- Molten Chloride Small Modular Reactor Performance Characteristics for Data Center OperationMatthew Anderson, Daniel Yankura, Matthew Sgambati, Mauricio Tano Retamales. 2017-2021 [doi]
- EMLIO: Minimizing I/O Latency and Energy Consumption for Large-Scale AI TrainingHasibul Jamil, Md. S. Q. Zulkar Nine, Tevfik Kosar. 2022-2031 [doi]
- Modeling the Carbon Footprint of HPC: The Top 500 and EasyCVarsha Rao, Andrew A. Chien. 2032-2040 [doi]
- EAS-Sim: A Framework and its Methodology for the Co-Design of Multi-Objective, Energy-Aware Schedulers for AI ClustersRoblex Nana Tchakoute, Claude Tadonki. 2041-2050 [doi]
- Porting a Fortran plasma simulation to Exascale on AMD GPUs using both OpenMP and KokkosEtienne Malaboeuf, Kevin Obrejan, Mathieu Peybernes, Julien Bigot, Emily Bourne, Virginie Grandgirard. 2051-2067 [doi]
- Bridging FPGA and GPU over PCIe: A Low-Latency Communication Path using AVX-512Michele Martinelli, Carlotta Chiarini, Andrea Biagioni, Paolo Cretaro, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Pierpaolo Perticaroli, Francesco Simula, Luca Pontisso, Cristian Rossi, Piero Vicini. 2068-2076 [doi]
- Towards Efficient Load Balancing BFS on GPUs: One Code for AMD, Intel & NvidiaKaan Olgu, Tobias Kenter, José L. Núñez-Yáñez, Simon McIntosh-Smith, Tom Deakin. 2077-2087 [doi]
- Scalable Neural Network Training: Distributed Data-Parallel ApproachesFernando Vázquez-Novoa, Pedro López 0001, José Flich, Rosa M. Badia. 2088-2099 [doi]
- Reduction-Aware Directive-Based Programming via Multi-Dimensional HomomorphismsRichard Schulze, Sergei Gorlatch, Ari Rasch. 2100-2113 [doi]
- Mojo: MLIR-based Performance-Portable HPC Science Kernels on GPUs for the Python EcosystemWilliam F. Godoy, Tatiana Melnichenko, Pedro Valero-Lara, Wael R. Elwasif, Philip W. Fackler, Rafael Ferreira da Silva, Keita Teranishi, Jeffrey S. Vetter. 2114-2128 [doi]
- A Study of Performance Portability of Low-bit Fused Matrix-Vector Multiplication Kernels in SYCLZheming Jin. 2129-2136 [doi]
- Physical System Study on Balancing Interactive and Batch Job Performance through Oversubscribing SchedulingShohei Minami, Toshio Endo, Akihiro Nomura 0002, Hiroki Ohtsuji, Jun Kato 0007, Masahiro Miwa, Eiji Yoshida. 2137-2145 [doi]
- Implementing support for Interactive and AI workloads in a traditional HPC environmentJay McGlothlin, Christopher D. Carothers. 2146-2150 [doi]
- Evaluating HPC Scheduling Strategies for Urgent WorkloadsKetan Maheshwari, Anderson Borch, Jordan Webb, Brian Etz, Ross Miller, Frédéric Suter, Sarp Oral, Rafael Ferreira da Silva. 2151-2160 [doi]
- Modeling and Optimizing Real-Time Telescope Interaction for Multi-wavelength Observation of Gamma-ray BurstsYe Htet, Marion Sudvarg, Honghao Yang, Jeremy Buhler, Roger D. Chamberlain, James H. Buckley. 2161-2168 [doi]
- Adapting Classic Scheduling Heuristics for Online Execution under UncertaintyJason Chamorro, Gabriel Twigg-Ho, Jared Coleman, Tainã Coleman, Bhaskar Krishnamachari, Mohammadali Khodabandehlou. 2169-2180 [doi]
- A Workflow for Error Analysis for Drug Response Prediction via Statistical Standardization and Distribution AnalysisJake Gwinn, Justin M. Wozniak, Rajeev Jain, Yitan Zhu, Alexander Partin, Thomas S. Brettin, Rick Stevens. 2181-2189 [doi]
- Bridging Speed and Optimality in Job Scheduling: A Hybrid Ant Colony Optimization Approach for Distributed SystemsHongwei Jin, Pawel Zuk, Krishnan Raghavan, Prachi Jadhav, Aiden Hamade, Ewa Deelman, Prasanna Balaprakash. 2190-2200 [doi]
- CAMEO: A Co-design Architecture for Multi-objective Energy System OptimizationRounak Meyur, Sam Donald, Tonya J. Martin, Thiagarajan Ramachandran, Sumit Purohit. 2201-2212 [doi]
- DAGonStore: Reliable Data Management for Workflows on the Computing Continuum with DynoStore and DAGonStarDante D. Sánchez-Gallegos, José Luis González Compeán, Jesus Carretero, Raffaele Montella. 2213-2224 [doi]
- Do Large Language Models Speak Scientific Workflows?Orcun Yildiz, Tom Peterka. 2225-2233 [doi]
- Evaluating the Efficacy of LLM-Based Reasoning for Multiobjective HPC Job SchedulingPrachi Jadhav, Hongwei Jin, Ewa Deelman, Prasanna Balaprakash. 2234-2244 [doi]
- Integrating and Characterizing HPC Task Runtime Systems for hybrid AI-HPC workloadsAndré Merzky, Mikhail Titov, Matteo Turilli, Shantenu Jha. 2245-2256 [doi]
- LLM Agents for Interactive Workflow Provenance: Reference Architecture and Evaluation MethodologyRenan Souza 0001, Timothy Poteet, Brian Etz, Daniel Rosendo, Amal Gueroudji, Woong Shin, Prasanna Balaprakash, Rafael Ferreira da Silva. 2257-2268 [doi]
- Overcoming Dynamic I/O Boundaries: a Double-Sided Streaming Methodology with dispel4py and CAPIOMarco Edoardo Santimaria, Rosa Filgueira, Doriana Medic, Iacopo Colonnelli, Marco Aldinucci. 2269-2280 [doi]
- RESILIO : A Scalable and Composable Architecture for Tomographic Reconstruction WorkflowsAmal Gueroudji, Matthieu Dorier, Philip H. Carns, Parth Patel, Tekin Bicer, Robert Latham, Robert B. Ross, Kyle Chard, Ian T. Foster. 2281-2292 [doi]
- State Machine Orchestration of an HPC Workflow in CloudVanessa V. Sochat, Loïc Pottier, Daniel Milroy. 2293-2304 [doi]
- The (R)evolution of Scientific Workflows in the Agentic AI Era: Towards Autonomous ScienceWoong Shin, Renan Souza, Daniel Rosendo, Frédéric Suter, Feiyi Wang, Prasanna Balaprakash, Rafael Ferreira da Silva. 2305-2316 [doi]
- xGFabric: Coupling Sensor Networks and HPC Facilities with Private 5G Wireless Networks for Real-Time Digital AgricultureLiubov Kurafeeva, Alan Subedi, Ryan Hartung, Michael Fay, Avhishek Biswas, Shantenu Jha, Ozgur O. Kilic, Chandra Krintz, André Merzky, Douglas Thain, Mehmet C. Vuran, Rich Wolski. 2317-2327 [doi]
- Accelerating Advanced Light Source Science Through Multi-Facility HPC WorkflowsDavid Abramov, Samuel S. Welborn, Ryan Chard, Kuldeep Chawla, Xiaoya Chong, Elizabeth Clark, Bjoern Enders, Alexander Hexemer, Jason Jed, Wiebke Koepp, Harinarayan Krishnan, Seij De Leon, Dilworth Parkinson, David Perlmutter, Raja Vyshnavi Sriramoju, Thomas Uram, Lee Lisheng Yang, Dylan McReynolds. 2328-2335 [doi]
- Streaming X-ray Detector Data to Remote Facilities Using EJFATSinisa Veseli, John Hammonds, Steven Henke, Madeline Miller, Hannah Parraga, Ilya Baldin, Derek Howard, Yatish Kumar, Nicholas Schwarz. 2336-2346 [doi]
- Adapting scientific streaming inference workflows for a deterministic tensor processing unitSamantha Fowler, Kazutomo Yoshii, Antonino Miceli, Senthil Gnanasekaran, Tao Zhou, Nicholas Contini. 2347-2353 [doi]
- AI Agents for Enabling Autonomous Experiments at ORNL's HPC and Manufacturing User FacilitiesDaniel Rosendo, Stephen Dewitt, Renan Souza 0001, Phillipe Austria, Tirthankar Ghosal, Marshall T. McDonnell, Ross Miller, Tyler J. Skluzacek, James Haley, Bruno Turcksin, Jesse McGaha, Benjamin Mintz, Feiyi Wang, Mallikarjun Shankar, Sarp Oral, Rafael Ferreira da Silva. 2354-2361 [doi]
- X-ray Ptychography at the Edge: Towards Real-Time Feedback for High-Speed NanoimagingZirui Gao, Seher Karakuzu, Dmitri Gavrilov, Daniel B. Allan, Adam Thompson, Denis Leshchev, Hanfei Yan. 2362-2368 [doi]