Abstract is missing.
- tcFFT: A Fast Half-Precision FFT Library for NVIDIA Tensor CoresBin-Rui Li, Shenggan Cheng, James Lin. 1-11 [doi]
- MiniMod: A Modular Miniapplication Benchmarking Framework for HPCW. Pepper Marts, Matthew G. F. Dosanjh, Scott Levy, Whit Schonbein, Ryan E. Grant, Patrick G. Bridges. 12-22 [doi]
- Reproducible Performance Optimization of Complex Applications on the Edge-to-Cloud ContinuumDaniel Rosendo, Alexandru Costan, Gabriel Antoniu, Matthieu Simonin, Jean-Christophe Lombardo, Alexis Joly, Patrick Valduriez. 23-34 [doi]
- WIRE: Resource-efficient Scaling with Online Prediction for DAG-based WorkflowsBing Xie, Qiang Cao, Mayuresh Kunjir, Linli Wan, Jeffrey S. Chase, Anirban Mandal, Mats Rynge. 35-46 [doi]
- HPC AI500 V2.0: The Methodology, Tools, and Metrics for Benchmarking HPC AI SystemsZihan Jiang, Wanling Gao, Fei Tang, Lei Wang 0004, Xingwang Xiong, Chunjie Luo, Chuanxin Lan, Hongxiao Li, Jianfeng Zhan. 47-58 [doi]
- RPTCN: Resource Prediction for High-dynamic Workloads in Clouds based on Deep LearningWenyan Chen, Chengzhi Lu, Kejiang Ye, Yang Wang 0006, Cheng-Zhong Xu 0001. 59-69 [doi]
- READYS: A Reinforcement Learning Based Strategy for Heterogeneous Dynamic SchedulingNathan Grinsztajn, Olivier Beaumont, Emmanuel Jeannot, Philippe Preux. 70-81 [doi]
- Accelerating DNN Architecture Search at Scale Using Selective Weight TransferHongyuan Liu 0002, Bogdan Nicolae, Sheng Di, Franck Cappello, Adwait Jog. 82-93 [doi]
- SAP-SGD: Accelerating Distributed Parallel Training with High Communication Efficiency on Heterogeneous ClustersJing Cao, Zongwei Zhu, Xuehai Zhou. 94-102 [doi]
- 2PGraph: Accelerating GNN Training over Large Graphs on GPU ClustersLiZhi Zhang, Zhiquan Lai, Shengwei Li, Yu Tang, Feng Liu, Dongsheng Li. 103-113 [doi]
- HFlow: A Dynamic and Elastic Multi-Layered I/O ForwarderJaime Cernuda Garcia, Hariharan Devarajan, Luke Logan, Keith Bateman, Neeraj Rajesh, Jie Ye, Anthony Kougkas, Xian-He Sun. 114-124 [doi]
- Building A Fast and Efficient LSM-tree Store by Integrating Local Storage with Cloud StoragePeng Xu, Nannan Zhao, Jiguang Wan, Wei Liu, Shuning Chen, Yuanhui Zhou, Hadeel Albahar, Hanyang Liu, Liu Tang, Changsheng Xie. 125-134 [doi]
- Virtual Log-Structured Storage for High-Performance StreamingOvidiu-Cristian Marcu, Alexandru Costan, Bogdan Nicolae, Gabriel Antonin. 135-145 [doi]
- RISE: Reducing I/O Contention in Staging-based Extreme-Scale In-situ WorkflowsPradeep Subedi, Philip E. Davis, Manish Parashar. 146-156 [doi]
- Lazy-WL: A Wear-aware Load Balanced Data Redistribution Method for Efficient SSD Array ScalingHanchen Guo, Zhehan Lin, Yunfei Gu, Chentao Wu, Li Jiang, Jie Li 0002, Guangtao Xue, Minyi Guo. 157-168 [doi]
- Streamlining distributed Deep Learning I/O with ad hoc file systemsFrederic Schimmelpfennig, Marc-André Vef, Reza Salkhordeh, Alberto Miranda, Ramon Nou, André Brinkmann. 169-180 [doi]
- Accelerating GPU Message Communication for Autonomous Navigation SystemsHao Wu, Jiangming Jin, Jidong Zhai, Yifan Gong 0003, Wei Liu. 181-191 [doi]
- csTuner: Scalable Auto-tuning Framework for Complex Stencil Computation on GPUsQingxiao Sun, Yi Liu 0013, Hailong Yang, Zhonghui Jiang, Xiaoyan Liu, Ming Dun, Zhongzhi Luan, Depei Qian. 192-203 [doi]
- Octo-Tiger's New Hydro Module and Performance Using HPX+CUDA on ORNL's SummitPatrick Diehl, Gregor Daiß, Dominic Marcello, Kevin A. Huck, Sagiv Shiber, Hartmut Kaiser, Juhan Frank, Geoffrey C. Clayton, Dirk Pflüger. 204-214 [doi]
- Pipelined Preconditioned s-step Conjugate Gradient Methods for Distributed Memory SystemsManasi Tiwari, Sathish Vadhiyar. 215-225 [doi]
- Thrifty Label Propagation: Fast Connected Components for Skewed-Degree GraphsMohsen Koohi Esfahani, Peter Kilpatrick, Hans Vandierendonck. 226-237 [doi]
- Optimizing Distributed Load Balancing for Workloads with Time-Varying ImbalanceJonathan Lifflander, Nicole Lemaster Slattengren, Philippe P. Pébaÿ, Phil Miller, Francesco Rizzi, Matthew T. Bettencourt. 238-249 [doi]
- Distributed Work Stealing at Scale via MatchmakingHrushit Parikh, Vinit Deodhar, Ada Gavrilovska, Santosh Pande. 250-260 [doi]
- Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across ContextsDominik Scheinert, Lauritz Thamsen, Houkun Zhu, Jonathan Will, Alexander Acker, Thorsten Wittkopp, Odej Kao. 261-270 [doi]
- CSWAP: A Self-Tuning Compression Framework for Accelerating Tensor Swapping in GPUsPing Chen, Shuibing He, Xuechen Zhang, Shuaiben Chen, Peiyi Hong, Yanlong Yin, Xian-He Sun, Gang Chen. 271-282 [doi]
- Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUsJiannan Tian, Sheng Di, Xiaodong Yu, Cody Rivera, Kai Zhao, Sian Jin, Yunhe Feng, Xin Liang 0001, Dingwen Tao, Franck Cappello. 283-293 [doi]
- Exploring Autoencoder-based Error-bounded Compression for Scientific DataJinyang Liu, Sheng Di, Kai Zhao, Sian Jin, Dingwen Tao, Xin Liang 0001, Zizhong Chen, Franck Cappello. 294-306 [doi]
- cuZ-Checker: A GPU-Based Ultra-Fast Assessment System for Lossy CompressionsXiaodong Yu, Sheng Di, Ali Murat Gok, Dingwen Tao, Franck Cappello. 307-319 [doi]
- DPZ: Improving Lossy Compression Ratio with Information Retrieval on Scientific DataJialing Zhang, Jiaxi Chen, Xiaoyan Zhuo, Aekyeung Moon, Seung Woo Son 0001. 320-331 [doi]
- O(1) Communication for Distributed SGD through Two-Level Gradient AveragingSubhadeep Bhattacharya, Weikuan Yu, Fahim Tahmid Chowdhury, Kathryn Mohror. 332-343 [doi]
- Distributed Computation of Persistent Homology from Partitioned Big DataNicholas O. Malott, Rishi R. Verma, Rohit P. Singh, Philip A. Wilsey. 344-354 [doi]
- FineQuery: Fine-Grained Query Processing on CPU-GPU Integrated ArchitecturesDalin Wang, Feng Zhang 0007, Weitao Wan, Hourun Li, Xiaoyong Du 0001. 355-365 [doi]
- Packet Forwarding Cache of Commodity Switches for Parallel ComputersShoichi Hirasawa, Hayato Yamaki, Michihiro Koibuchi. 366-376 [doi]
- Two-Chains: High Performance Framework for Function Injection and ExecutionMegan Grodowitz, Luis E. Peña, Curtis Dunham, Dong Zhong, Pavel Shamis, Steve Poole. 377-387 [doi]
- HNGraph: Parallel Graph Processing in Hybrid Memory Based NUMA SystemsWei Liu, Haikun Liu, Xiaofei Liao, Hai Jin 0001, Yu Zhang 0027. 388-397 [doi]
- Modeling the Linux page cache for accurate simulation of data-intensive applicationsHoang-Dung Do, Valérie Hayot-Sasson, Rafael Ferreira da Silva, Christopher Steele, Henri Casanova, Tristan Glatard. 398-408 [doi]
- Characterizing Impacts of Storage Faults on HPC Applications: A Methodology and InsightsBo Fang, Daoce Wang, Sian Jin, Quincey Koziol, Zhao Zhang 0007, Qiang Guan, Suren Byna, Sriram Krishnamoorthy, Dingwen Tao. 409-420 [doi]
- Understanding the Effects of DRAM Correctable Error Logging at ScaleKurt B. Ferreira, Scott Levy, Victor Kuhns, Nathan DeBardeleben, Sean Blanchard. 421-432 [doi]
- Tackling Cold Start of Serverless Applications by Efficient and Adaptive Container Runtime ReusingKun Suo, Junggab Son, Dazhao Cheng, Wei Chen 0038, Sabur Baidya. 433-443 [doi]
- Reusability First: Toward FAIR WorkflowsMatthew Wolf, Jeremy Logan, Kshitij Mehta, Daniel A. Jacobson, Mikaela Cashman, Angelica M. Walker, Greg Eisenhauer, Patrick M. Widener, Ashley Clif. 444-455 [doi]
- Thinking More about RDMA Memory SemanticsTeng Ma, Kang Chen, Shaonan Ma, Zhuo Song, Yongwei Wu. 456-467 [doi]
- Monitoring Large Scale Supercomputers: A Case Study with the Lassen SupercomputerTapasya Patki, Adam Bertsch, Ian Karlin, Dong H. Ahn, Brian Van Essen, Barry Rountree, Bronis R. de Supinski, Nathan Besaw. 468-480 [doi]
- Robustness Analysis of Loop-Free Floating-Point Programs via Symbolic Automatic DifferentiationArnab Das, Tanmay Tirpankar, Ganesh Gopalakrishnan, Sriram Krishnamoorthy. 481-491 [doi]
- Understanding Soft Error Sensitivity of Deep Learning Models and Frameworks through Checkpoint AlterationElvis Rojas, Diego Pérez, Jon C. Calhoun, Leonardo Bautista-Gomez, Terry Jones, Esteban Meneses. 492-503 [doi]
- On-the-Fly, Robust Translation of MPI LibrariesEdgar A. León, Marc Joos, Nathan Hanford, Adrien Cotte, Tony Delforge, François Diakhaté, Vincent Ducrot, Ian Karlin, Marc Pérache. 504-515 [doi]
- Daps: A Dynamic Asynchronous Progress Stealing Model for MPI CommunicationKaiming Ouyang, Min-Si, Astushi Hori, Zizhong Chen, Pavan Balaji. 516-527 [doi]
- Combining One-Sided Communications with Task-Based Programming ModelsKevin Sala, Sandra Macià, Vicenç Beltran 0001. 528-541 [doi]
- Optimizing Barrier Synchronization on ARMv8 Many-Core ArchitecturesWanrong Gao, Jianbin Fang, Chun Huang, Chuanfu Xu, Zheng Wang 0001. 542-552 [doi]
- Evaluation of SPEC CPU and SPEC OMP on the A64FXYuetsu Kodama, Masaaki Kondo, Mitsuhisa Sato. 553-561 [doi]
- Energy Efficiency Aspects of the AMD Zen 2 ArchitectureRobert Schöne, Thomas Ilsche, Mario Bielert, Markus Velten, Markus Schmidl, Daniel Hackenberg. 562-571 [doi]
- Explicit uncore frequency scaling for energy optimisation policies with EAR in Intel architecturesJulita Corbalán, Oriol Vidal, Lluis Alonso, Jordi Aneas. 572-581 [doi]
- FIRESTARTER 2: Dynamic Code Generation for Processor Stress TestsRobert Schöne, Markus Schmidl, Mario Bielert, Daniel Hackenberg. 582-590 [doi]
- Cooling the Data Center: Design of a Mechanical Controls Owner Project Requirements (OPR) TemplateStefan A. Robila, David Grant, Chris DePrater, Vali Sorell, Terry L. Rodgers, David Martinez, Shlomo Novotny. 591-595 [doi]
- A Conceptual Framework for HPC Operational Data AnalyticsAlessio Netti, Woong Shin, Michael Ott 0001, Torsten Wilde, Natalie J. Bates. 596-603 [doi]
- An Execution Fingerprint Dictionary for HPC Application RecognitionThomas Jakobsche, Nicolas Lachiche, Aurélien Cavelan, Florina M. Ciorba. 604-608 [doi]
- An Integrated Job Monitor, Analyzer and PredictorAshish Pal, Preeti Malakar. 609-617 [doi]
- Backfilling HPC Jobs with a Multimodal-Aware PredictorKenneth Lamar, Alexander V. Goponenko, Christina L. Peterson, Benjamin A. Allan, Jim M. Brandt, Damian Dechev. 618-622 [doi]
- Sequence-RTG: Efficient and Production-Ready Pattern Mining in System Log MessagesLouise Harding, Fabien Wernli, Frédéric Suter. 623-631 [doi]
- The Challenge of Disproportionate Importance of Temporal Features in Predicting HPC Power ConsumptionChengcheng Li, Ahmad M. Karimi, Woong Shin, Hairong Qi 0001, Feiyi Wang. 632-636 [doi]
- Dynamic and Adaptive Monitoring and Analysis for Many-task Ensemble ComputingShantenu Jha, Allen D. Malony. 637-641 [doi]
- A Scalability Study of Data Exchange in HPC Multi-component WorkflowsJie Yin, Atsushi Hori, Balazs Gerofi, Yutaka Ishikawa. 642-648 [doi]
- The Case for Storage Optimization Decoupling in Deep Learning FrameworksRicardo Macedo, Cláudia Correia, Marco Dantas, Cláudia Brito, Weijia Xu, Yusuke Tanimura, Jason Haga, João Paulo 0001. 649-656 [doi]
- MONARCH: Hierarchical Storage Management for Deep Learning FrameworksMarco Dantas, Diogo Leitão, Cláudia Correia, Ricardo Macedo, Weijia Xu, João Paulo 0001. 657-663 [doi]
- pMEMCPY: a simple, lightweight, and portable I/O library for storing data in persistent memoryLuke Logan, Jay F. Lofstead, Scott Levy, Patrick M. Widener, Xian-He Sun, Anthony Kougkas. 664-670 [doi]
- Parallel I/O Evaluation Techniques and Emerging HPC Workloads: A PerspectiveSarah Neuwirth, Arnab Kumar Paul. 671-679 [doi]
- Special function neural network (SFNN) modelsYuzhen Liu, Oana Marin. 680-685 [doi]
- AMR-Net: Convolutional Neural Networks for Multi-resolution Steady Flow PredictionYuuichi Asahi, Sora Hatayama, Takashi Shimokawabe, Naoyuki Onodera, Yuta Hasegawa, Yasuhiro Idomura. 686-691 [doi]
- A Deep Learning-Based Particle-in-Cell Method for Plasma SimulationsXavier Aguilar, Stefano Markidis. 692-697 [doi]
- Hybrid workflow of Simulation and Deep Learning on HPC: A Case Study for Material Behavior DeterminationLi Zhong, Dennis Hoppe, Naweiluo Zhou, Oleksandr Shcherbakov. 698-704 [doi]
- Higgs Boson Classification: Brain-inspired BCPNN Learning with StreamBrainMartin Svedin, Artur Podobas, Steven Wei Der Chien, Stefano Markidis. 705-710 [doi]
- A64FX performance: experience on OokamiMd Abdullah Shahneous Bari, Barbara M. Chapman, Anthony Curtis, Robert J. Harrison, Eva Siegmann, Nikolay A. Simakov, Matthew D. Jones. 711-718 [doi]
- Early Evaluation of Fugaku A64FX Architecture Using Climate WorkloadsSarat Sreepathi, Mark Taylor. 719-727 [doi]
- Performance Evaluation and Analysis of A64FX many-core Processor for the Fiber Miniapp SuiteMiwako Tsuji, Mitsuhisa Sato. 728-735 [doi]
- A64FX - Your Compiler You Must Decide!Jens Domke. 736-740 [doi]
- Cluster of emerging technology: evaluation of a production HPC system based on A64FXFabio Banchelli, Kilian Peiro, Guillem Ramirez-Gargallo, Joan Vinyals, David Vicente, Marta Garcia-Gasulla, Filippo Mantovani. 741-750 [doi]
- Sequences of Sparse Matrix-Vector Multiplication on Fugaku's A64FX processorsJérôme Gurhem, Maxence Vandromme, Miwako Tsuji, Serge G. Petiton, Mitsuhisa Sato. 751-758 [doi]
- From Domain-Specific Languages to Memory-Optimized Accelerators for Fluid DynamicsKarl F. A. Friebel, Stephanie Soldavini, Gerald Hempel, Christian Pilato, Jerónimo Castrillón. 759-766 [doi]
- Accelerating advection for atmospheric modelling on Xilinx and Intel FPGAsNick Brown. 767-774 [doi]
- Optimisation of an FPGA Credit Default Swap engine by embracing dataflow techniquesNick Brown 0002, Mark Klaisoongnoen, Oliver Thomson Brown. 775-778 [doi]
- TIGRA: A Tightly Integrated Generic RISC-V Accelerator InterfaceBrad Green, Dillon Todd, Jon C. Calhoun, Melissa C. Smith. 779-782 [doi]
- HBM2 Memory System for HPC Applications on an FPGANorihisa Fujita, Ryohei Kobayashi, Yoshiki Yamaguchi, Taisuke Boku. 783-786 [doi]
- A memory bandwidth improvement with memory space partitioning for single-precision floating-point FFT on Stratix 10 FPGATakaaki Miyajima, Kentaro Sano. 787-790 [doi]
- An FPGA-based storage control with load balancingNaoya Umezu, Yoshiki Yamaguchi, Taisuke Boku. 791-794 [doi]
- CVFCC: CV-Based Framework for Container Consolidation in Cloud Data CentersYuting Li, Yun Xu, Xuehai Zhou. 795-796 [doi]
- A Dynamic Power Capping Library for HPC ApplicationsSahil Sharma, Zhiling Lan, Xingfu Wu, Valerie Taylor 0001. 797-798 [doi]
- SDIS: A PB-level seismic data index system with ML methodsShaoheng Luo, Lei Wang, Yufeng Liu, Changhai Zhao, Xudong Zhang. 799-800 [doi]
- Malleability Implementation in a MPI Iterative MethodIker Martín-Álvarez, José Ignacio Aliaga, María Isabel Castillo, Rafael Mayo 0002, Sergio Iserte. 801-802 [doi]
- Computational Storage to Increase the Analysis Capability of Tier-2 HEP Data SitesChen Zou, Andrew A. Chien, Robert W. Gardner, Ilija Vukotic. 803-804 [doi]
- NUMA-aware I/O System Call SteeringChan-Gyu Lee, Hyun-Wook Jin. 805-806 [doi]
- A Roadmap to Robust Science for High-throughput Applications: The Developers' PerspectiveMichela Taufer, Ewa Deelman, Rafael Ferreira da Silva, Trilce Estrada, M. Hall, Miron Livny. 807-808 [doi]
- A Transfer Learning Scheme for Time Series Forecasting Using Facebook ProphetMenuka Warushavithana, Saptashwa Mitra, Mazdak Arabi, F. Jay Breidt, Sangmi Lee Pallickara, Shrideep Pallickara. 809-810 [doi]
- Exploring Node Connection Modes in Multi-Rail Fat-treeYuyang Wang, Fei Lei, Dezun Dong. 811-812 [doi]
- RELAR: A Reinforcement Learning Framework for Adaptive Routing in Network-on-ChipsChanghong Wang, Dezun Dong, Zicong Wang, Xiaoyun Zhang, Zhenyu Zhao. 813-814 [doi]
- A Generative Approach to Visualizing Satellite DataSaptashwa Mitra, Daniel Rammer, Shrideep Pallickara, Sangmi Lee Pallickara. 815-816 [doi]
- Load Balancing Policies for Nested Fork-JoinLukas Reitz. 817-818 [doi]
- Supporting Elastic Compaction of LSM-tree with a FaaS ClusterXiaoliang Wang, Jianchuan Li, Peiquan Jin, Kuankuan Guo, Yuanjin Lin, Ming Zhao. 819-820 [doi]
- Automatic Parallelisation of Sturctured Mesh Computations with SYCLGábor Dániel Balogh, István Z. Reguly. 821-822 [doi]
- Halcyon: Unified HPC Center OperationsKevin D. Colby, Shawn Rice. 823-824 [doi]
- CASQ: Accelerate Distributed Deep Learning with Sketch-Based Gradient QuantizationKeshi Ge, Yiming Zhang 0003, Yongquan Fu, Zhiquan Lai, Xiaoge Deng, Dongsheng Li 0001. 825-826 [doi]
- Toward a Comprehensive Benchmark Suite for Evaluating GASPI in HPC EnvironmentsSarah Neuwirth. 827-828 [doi]
- Incorporating Fault-Tolerance Awareness into System-Level Modeling and SimulationTrokon Johnson, Herman Lam. 829-830 [doi]