Abstract is missing.
- Solving Linear Systems on a GPU with Hierarchically Off-Diagonal Low-Rank ApproximationsChao Chen, Per-Gunnar Martinsson. 1-15 [doi]
- GUFI: Fast, Secure File System Metadata Search for Both Privileged and Unprivileged UsersDominic Manno, Jason Lee, Prajwal Challa, Qing Zheng, David Bonnie, Gary Grider, Bradley W. Settlemyer. 1-14 [doi]
- Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct SolversAhmad Abdelfattah, Pieter Ghysels, Wajih Boukaram, Stanimire Tomov, Xiaoye Sherry Li, Jack J. Dongarra. 1-14 [doi]
- DayDream: Executing Dynamic Scientific Workflows on Serverless Platforms with Hot StartsRohan Basu Roy, Tirthak Patel, Devesh Tiwari. 1-18 [doi]
- Deinsum: Practically I/O Optimal Multi-Linear AlgebraAlexandros Nikolaos Ziogas, Grzegorz Kwasniewski, Tal Ben-Nun, Timo Schneider, Torsten Hoefler. 1-15 [doi]
- MetaWBC: POSIX-Compliant Metadata Write-Back Caching for Distributed File SystemsYingjin Qian, Wen Cheng, Lingfang Zeng, Marc-André Vef, Oleg Drokin, Andreas Dilger, Shuichi Ihara, Wusheng Zhang, Yang Wang 0006, André Brinkmann. 1-20 [doi]
- Mapping Out the HPC Dependency ChaosFarid Zakaria, Thomas R. W. Scogland, Todd Gamblin, Carlos Maltzahn. 1-12 [doi]
- Symmetric Block-Cyclic Distribution: Fewer Communications Leads to Faster Dense Cholesky FactorizationOlivier Beaumont, Philippe Duchon, Lionel Eyraud-Dubois, Julien Langou, Mathieu Vérité. 1-15 [doi]
- Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental ApplicationsQinglei Cao, Sameh Abdulah, Rabab Alomairy, Yu Pei, Pratik Nag, George Bosilca, Jack J. Dongarra, Marc G. Genton, David E. Keyes, Hatem Ltaief, Ying Sun 0002. 1-12 [doi]
- WholeGraph: A Fast Graph Neural Network Training Framework with Multi-GPU Distributed Shared Memory ArchitectureDongxu Yang, Junhong Liu, Jiaxing Qi, Junjie Lai. 1-14 [doi]
- SFS: Smart OS Scheduling for Serverless FunctionsYuqi Fu, Li Liu 0045, Haoliang Wang, Yue Cheng 0001, Songqing Chen. 1-16 [doi]
- GraphFly: Efficient Asynchronous Streaming Graphs Processing via Dependency-FlowDan Chen, Chuangyi Gui, Yi Zhang, Hai Jin 0001, Long Zheng 0003, Yu Huang 0013, Xiaofei Liao. 1-14 [doi]
- Scalable Deep Learning-Based Microarchitecture Simulation on GPUsSantosh Pandey, Lingda Li, Thomas Flynn, Adolfy Hoisie, Hang Liu 0001. 1-15 [doi]
- Canary: Fault-Tolerant FaaS for Stateful Time-Sensitive ApplicationsMoiz Arif, Kevin Assogba, M. Mustafa Rafique. 1-16 [doi]
- Out of Hypervisor (OoH): Efficient Dirty Page Tracking in Userspace Using Hardware Virtualization FeaturesStella Bitchebe, Alain Tchana. 1-14 [doi]
- Using Answer Set Programming for HPC Dependency SolvingTodd Gamblin, Massimiliano Culpo, Gregory Becker, Sergei Shudler. 1-15 [doi]
- W-Cycle SVD: A Multilevel Algorithm for Batched SVD on GPUsJunmin Xiao, Yunfei Pang, Qing Xue, Chaoyang Shui, Ke Meng, Hui Ma, Mingyi Li, Xiaoyang Zhang, Guangming Tan. 1-16 [doi]
- AD for an Array Language with Nested ParallelismRobert Schenck, Ola Rphinning, Troels Henriksen, Cosmin E. Oancea. 1-15 [doi]
- Accelerating Parallel Write via Deeply Integrating Predictive Lossy Compression with HDF5Sian Jin, Dingwen Tao, Houjun Tang, Sheng Di, Suren Byna, Zarija Lukic, Franck Cappello. 1-15 [doi]
- TD-NUCA: Runtime Driven Management of NUCA Caches in Task Dataflow Programming ModelsPaul Caheny, Lluc Alvarez, Marc Casas, Miquel Moretó. 1-15 [doi]
- Positive-Phase Temperature Scaling for Quantum-Assisted Boltzmann Machine TrainingJose P. Pinilla, Steven J. E. Wilton. 1-12 [doi]
- HyLo: A Hybrid Low-Rank Natural Gradient Descent MethodBaorun Mu, Saeed Soori, Bugra Can, Mert Gürbüzbalaban, Maryam Mehri Dehnavi. 1-16 [doi]
- EL-Rec: Efficient Large-Scale Recommendation Model Training via Tensor-Train Embedding TableZheng Wang, Yuke Wang, Boyuan Feng, Dheevatsa Mudigere, Bharath Muthiah, Yufei Ding. 1-14 [doi]
- Efficient Quantized Sparse Matrix Operations on Tensor CoresShigang Li 0002, Kazuki Osawa, Torsten Hoefler. 1-15 [doi]
- Finding Inputs that Trigger Floating-Point Exceptions in GPUs via Bayesian OptimizationIgnacio Laguna, Ganesh Gopalakrishnan. 1-14 [doi]
- From Correctable Memory Errors to Uncorrectable Memory Errors: What Error Bits TellCong Li, Yu Zhang, Jialei Wang, Hang Chen, Xian Liu, Tai Huang, Liang Peng, Shen Zhou, Lixin Wang, Shijian Ge. 1-14 [doi]
- Productive Performance Engineering for Weather and Climate Modeling with PythonTal Ben-Nun, Linus Groner, Florian Deconinck, Tobias Wicky, Eddie Davis, Johann Dahm, Oliver Elbert, Rhea George, Jeremy McGibbon, Lukas Trümper, Elynn Wu, Oliver Fuhrer, Thomas C. Schulthess, Torsten Hoefler. 1-14 [doi]
- ReSemble: Reinforced Ensemble Framework for Data PrefetchingPengmiao Zhang, Rajgopal Kannan, Ajitesh Srivastava, Anant V. Nori, Viktor K. Prasanna. 1-14 [doi]
- STRONGHOLD: Fast and Affordable Billion-Scale Deep Learning Model TrainingXiaoyang Sun, Wei Wang, Shenghao Qiu, Renyu Yang, Songfang Huang, Jie Xu 0007, Zheng Wang 0001. 1-17 [doi]
- AI for Quantum Mechanics: High Performance Quantum Many-Body Simulations via Deep LearningXuncheng Zhao, Mingfan Li, Qian Xiao, Junshi Chen, Fei Wang, Li Shen, Meijia Zhao, Wenhao Wu, Hong An, Lixin He, Xiao Liang. 1-15 [doi]
- Climbing the Summit and Pushing the Frontier of Mixed Precision Benchmarks at Extreme ScaleHao Lu, Michael A. Matheson, Vladyslav Oles, J. Austin Ellis, Wayne Joubert, Feiyi Wang. 1-15 [doi]
- Scalable Distributed High-Order Stencil ComputationsMathias Jacquelin, Mauricio Araya-Polo, Jie Meng. 1-13 [doi]
- Extreme-Scale Many-against-Many Protein Similarity SearchOguz Selvitopi, Saliya Ekanayake, Giulia Guidi, Muaaz G. Awan, Georgios A. Pavlopoulos, Ariful Azad, Nikos Kyrpides, Leonid Oliker, Katherine A. Yelick, Aydin Buluç. 1-12 [doi]
- Extreme Scale Earthquake Simulation with Uncertainty QuantificationTsuyoshi Ichimura, Kohei Fujita, Ryota Kusakabe, Kentaro Koyama, Sota Murakami, Yuma Kikuchi, Takane Hori, Muneo Hori, Hikaru Inoue, Takafumi Nose, Takahiro Kawashima, Maddegedara Lalith. 1-11 [doi]
- SPATL: Salient Parameter Aggregation and Transfer Learning for Heterogeneous Federated LearningSixing Yu, Phuong Nguyen, Waqwoya Abebe, Wei Qian, Ali Anwar 0001, Ali Jannesari 0001. 1-14 [doi]
- SERVIZ: A Shared In Situ Visualization ServiceSrinivasan Ramesh, Hank Childs, Allen D. Malony. 1-14 [doi]
- Scalable Automatic Differentiation of Multiple Parallel Paradigms through Compiler AugmentationWilliam S. Moses, Sri Hari Krishna Narayanan, Ludger Paehler, Valentin Churavy, Michel Schanen, Jan Hückelheim, Johannes Doerfert, Paul D. Hovland. 1-18 [doi]
- Scalable Irregular Parallelism with GPUs: Getting CPUs Out of the WayYuxin Chen, Benjamin Brock, Serban D. Porumbescu, Aydin Buluç, Katherine A. Yelick, John D. Owens. 1-16 [doi]
- LightSeq2: Accelerated Training for Transformer-Based Models on GPUsXiaohui Wang, Yang Wei, Ying Xiong, Guyue Huang, Xian Qian, Yufei Ding, Mingxuan Wang, Lei Li 0005. 1-14 [doi]
- Memory Optimizations in an Array LanguagePhilip Munksgaard, Troels Henriksen, Ponnuswamy Sadayappan, Cosmin E. Oancea. 1-15 [doi]
- Parla: A Python Orchestration System for Heterogeneous ArchitecturesHochan Lee, William Ruys, Ian Henriksen, Arthur Peters, Yineng Yan, Sean Stephens, Bozhi You, Henrique Fingler, Martin Burtscher, Milos Gligoric, Karl Schulz, Keshav Pingali, Christopher J. Rossbach, Mattan Erez, George Biros. 1-15 [doi]
- Graph Neural Networks Based Memory Inefficiency Detection Using Selective SamplingPengcheng Li 0001, Yixin Guo, Yingwei Luo, Xiaolin Wang 0001, Zhenlin Wang, Xu Liu. 1-14 [doi]
- Image Gradient Decomposition for Parallel and Memory-Efficient Ptychographic ReconstructionXiao Wang, Aristeidis Tsaris, Debangshu Mukherjee, Mohamed Wahib, Peng Chen, Mark Oxley, Olga Ovchinnikova, Jacob D. Hinkle. 1-13 [doi]
- STMatch: Accelerating Graph Pattern Matching on GPU with Stack-Based Loop OptimizationsYihua Wei, Peng Jiang 0004. 1-13 [doi]
- HammingMesh: A Network Topology for Large-Scale Deep LearningTorsten Hoefler, Tommaso Bonato, Daniele De Sensi, Salvatore Di Girolamo, Shigang Li 0002, Marco Heddes, Jon Belk, Deepak Goel, Miguel Castro, Steve Scott. 1-18 [doi]
- Blaze: Fast Graph Processing on Fast SSDsJuno Kim, Steven Swanson. 1-15 [doi]
- DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented ScaleReza Yazdani Aminabadi, Samyam Rajbhandari, Ammar Ahmad Awan, Cheng Li 0001, Du Li, Elton Zheng, Olatunji Ruwase, Shaden Smith, Minjia Zhang, Jeff Rasley, Yuxiong He. 1-15 [doi]
- A GPU-Accelerated AMR Solver for Gravitational Wave PropagationMilinda Fernando, David Neilsen, Eric W. Hirschmann, Yosef Zlochower, Hari Sundar, Omar Ghattas, George Biros. 1-15 [doi]
- vGraph: Memory-Efficient Multicore Graph Processing for Traversal-Centric AlgorithmsMenghan Jia, Yiming Zhang 0003, Xinbiao Gan, Dongsheng Li 0001, Erci Xu, Ruibo Wang, Kai Lu. 1-14 [doi]
- Building Blocks for Network-Accelerated Distributed File SystemsSalvatore Di Girolamo, Daniele De Sensi, Konstantin Taranov, Milos Malesevic, Maciej Besta, Timo Schneider, Severin Kistler, Torsten Hoefler. 1-14 [doi]
- Scaling Graph 500 SSSP to 140 Trillion Edges with over 40 Million CoresYuanwei Wang, Huanqi Cao, Zixuan Ma, Wanwang Yin, Wenguang Chen. 1-15 [doi]
- Vectorizing Sparse Matrix Computations with Partially-Strided CodeletsKazem Cheshmi, Zachary Cetinic, Maryam Mehri Dehnavi. 1-15 [doi]
- SeqDLM: A Sequencer-Based Distributed Lock Manager for Efficient Shared File Access in a Parallel File SystemQi Chen, Shaonan Ma, Kang Chen, Teng Ma, Xin Liu, Dexun Chen, Yongwei Wu, Zuoning Chen. 1-14 [doi]
- Dynamic Quality Metric Oriented Error Bounded Lossy Compression for Scientific DatasetsJinyang Liu, Sheng Di, Kai Zhao 0008, Xin Liang 0001, Zizhong Chen, Franck Cappello. 1-15 [doi]
- SpDISTAL: Compiling Distributed Sparse Tensor ComputationsRohan Yadav, Alex Aiken, Fredrik Kjolstad. 1-15 [doi]
- P-Massive: A Real-Time Search Engine for a Multi-Terabyte Mass Spectrometry DatabaseNarangerelt Batsoyol, Benjamin S. Pullman, Mingxun Wang 0001, Nuno Bandeira, Steven Swanson. 1-15 [doi]
- HGL: Accelerating Heterogeneous GNN Training with Holistic Representation and OptimizationYuntao Gui, Yidi Wu, Han Yang 0002, Tatiana Jin, Boyang Li 0016, Qihui Zhou, James Cheng, Fan Yu. 1-15 [doi]
- A Taxonomy of Error Sources in HPC I/O Machine Learning ModelsMihailo Isakov, Mikaela Currier, Eliakin Del Rosario, Sandeep Madireddy, Prasanna Balaprakash, Philip H. Carns, Robert B. Ross, Glenn K. Lockwood, Michel A. Kinsy. 1-14 [doi]
- Exaflops Biomedical Knowledge Graph AnalyticsRamakrishnan Kannan, Piyush Sao, Hao Lu, Jakub Kurzak, Gundolf Schenk, Yongmei Shi, Seung-Hwan Lim, Sharat Israni, Vijay Thakkar, Guojing Cong, Robert M. Patton, Sergio E. Baranzini, Richard W. Vuduc, Thomas E. Potok. 1-11 [doi]
- CoGNN: Efficient Scheduling for Concurrent GNN Training on GPUsQingxiao Sun, Yi Liu 0013, Hailong Yang, Ruizhe Zhang 0012, Ming Dun, Mingzhen Li, Xiaoyan Liu, Wencong Xiao, Yong Li, Zhongzhi Luan, Depei Qian. 1-15 [doi]
- Approximate Computing Through the Lens of Uncertainty QuantificationKonstantinos Parasyris, James Diffenderfer, Harshitha Menon, Ignacio Laguna, Jackson Vanover, Ryan Vogt, Daniel Osei-Kuffuor. 1-14 [doi]
- Pushing the Frontier in the Design of Laser-Based Electron Accelerators with Groundbreaking Mesh-Refined Particle-In-Cell Simulations on Exascale-Class SupercomputersLuca Fedeli, Axel Huebl, France Boillod-Cerneux, Thomas Clark, Kevin Gott, Conrad Hillairet, Stephan Jaure, Adrien Leblanc, Rémi Lehe, Andrew Myers 0001, Christelle Piechurski, Mitsuhisa Sato, Neïl Zaïm, Weiqun Zhang, Jean-Luc Vay, Henri Vincenti. 1-12 [doi]
- ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set RepresentationsMaciej Besta, Cesare Miglioli, Paolo Sylos Labini, Jakub Tetek, Patrick Iff, Raghavendra Kanakagiri, Saleh Ashkboos, Kacper Janda, Michal Podstawski, Grzegorz Kwasniewski, Niels Gleinig, Flavio Vella, Onur Mutlu, Torsten Hoefler. 1-17 [doi]
- Scaling Correlated Fragment Molecular Orbital Calculations on SummitGiuseppe M. J. Barca, Calum Snowdon, Jorge L. Galvez Vallejo, Fazeleh Kazemian, Alistair P. Rendell, Mark S. Gordon. 1-14 [doi]
- UniQ: A Unified Programming Model for Efficient Quantum Circuit SimulationChen Zhang, Haojie Wang, Zixuan Ma, Lei Xie, Zeyu Song, Jidong Zhai. 1-16 [doi]
- 2.5 Million-Atom Ab Initio Electronic-Structure Simulation of Complex Metallic Heterostructures with DGDFTWei Hu, Hong An, Zhuoqiang Guo, Qingcai Jiang, Xinming Qin, Junshi Chen, Weile Jia, Chao Yang, Zhaolong Luo, Jielan Li, Wentiao Wu, Guangming Tan, Dongning Jia, Qinglin Lu, Fangfang Liu, Min Tian, Fang Li, Yeqi Huang, Liyi Wang, Sha Liu, Jinlong Yang. 1-13 [doi]
- Study of Workload Interference with Intelligent Routing on DragonflyYao Kang, Xin Wang, Zhiling Lan. 1-14 [doi]
- Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich SystemsPrasoon Sinha, Akhil Guliani, Rutwik Jain, Brandon Tran, Matthew D. Sinclair, Shivaram Venkataraman. 1-15 [doi]
- AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse MatricesZhen Du, Jiajia Li 0001, YinShan Wang, Xueqi Li, Guangming Tan, Ninghui Sun. 1-15 [doi]
- Accelerating Elliptic Curve Digital Signature Algorithms on GPUsZonghao Feng, Qipeng Xie, Qiong Luo 0001, Yujie Chen, Haoxuan Li, Huizhong Li, Qiang Yan. 1-13 [doi]
- Predicting Reuse Interval for Optimized Web Caching: An LSTM-Based Machine Learning ApproachPengcheng Li, Yixin Guo, Yongbin Gu. 1-15 [doi]
- Charter: Identifying the Most-Critical Gate Operations in Quantum Circuits via Amplified Gate ReversibilityTirthak Patel, Daniel Silver, Devesh Tiwari. 1-16 [doi]
- Mitigating Silent Data Corruptions in HPC Applications across Multiple Program InputsYafan Huang, Shengjian Guo, Sheng Di, Guanpeng Li, Franck Cappello. 1-14 [doi]
- Using Unused: Non-Invasive Dynamic FaaS Infrastructure with HPC-WhiskBartlomiej Przybylski, Maciej Pawlik, Pawel Zuk, Bartlomiej Lagosz, Maciej Malawski, Krzysztof Rzadca. 1-15 [doi]
- Optimization of Full-Core Reactor Simulations on SummitMisun Min, Yu-Hsiang Lan, Paul F. Fischer, Elia Merzari, Stefan Kerkemeier, Malachi Phillips, Thilina Rathnayake, April Novak, Derek Gaston, Noel Chalmers, Tim Warburton. 1-11 [doi]
- Lessons Learned on MPI+Threads CommunicationRohit Zambre, Aparna Chandramowlishwaran. 1-16 [doi]
- QoS-Aware Irregular Collaborative Inference for Improving Throughput of DNN ServicesKaihua Fu, Jiuchen Shi, Quan Chen 0002, Ningxin Zheng, Wei Zhang 0149, Deze Zeng, Minyi Guo. 1-14 [doi]
- Boosting Performance Optimization with Interactive Data Movement VisualizationPhilipp Schaad, Tal Ben-Nun, Torsten Hoefler. 1-16 [doi]
- LabStor: A Modular and Extensible Platform for Developing High-Performance, Customized I/O Stacks in UserspaceLuke Logan, Jaime Cernuda Garcia, Jay F. Lofstead, Xian-He Sun, Anthony Kougkas. 1-15 [doi]
- Scalable Linear Time Dense Direct Solver for 3-D Problems without Trailing Sub-Matrix DependenciesQianxiang Ma, Sameer Deshmukh, Rio Yokota. 1-12 [doi]
- Combining Hard and Soft Constraints in Quantum Constraint-Satisfaction SystemsEllis Wilson, Frank Mueller 0001, Scott Pakin. 1-14 [doi]
- CA3DMM: A New Algorithm Based on a Unified View of Parallel Matrix MultiplicationHua Huang, Edmond Chow. 1-15 [doi]
- VSGM: View-Based GPU-Accelerated Subgraph Matching on Large GraphsGuanxian Jiang, Qihui Zhou, Tatiana Jin, Boyang Li 0016, Yunjian Zhao, Yichao Li, James Cheng. 1-15 [doi]
- Large-Scale Simulation of Quantum Computational Chemistry on a New Sunway SupercomputerHonghui Shang, Li Shen 0001, Yi Fan, Zhiqian Xu 0005, Chu Guo, Jie Liu, Wenhao Zhou, Huan Ma, Rongfen Lin, Yuling Yang, Fang Li, Zhuoya Wang, Yunquan Zhang, Zhenyu Li. 1-14 [doi]
- PolarFly: A Cost-Effective and Flexible Low-Diameter TopologyKartik Lakhotia, Maciej Besta, Laura Monroe, Kelly Isham, Patrick Iff, Torsten Hoefler, Fabrizio Petrini. 1-15 [doi]
- Optimizing Random Access to Hierarchically-Compressed Data on GPUFeng Zhang 0007, Yihua Hu, Haipeng Ding, Zhiming Yao, Zhewei Wei, Xiao Zhang, Xiaoyong Du 0001. 1-15 [doi]
- Towards Scalable Resource Management for SupercomputersYiqin Dai, Yong Dong, Kai Lu, Ruibo Wang, Wei Zhang, Juan Chen, Mingtian Shao, Zheng Wang. 1-15 [doi]