Abstract is missing.
- Message from the 2024 General Co-chairsGagan Agrawal, Alba Cristina Melo. [doi]
- Message from the 2024 Workshops Chair and Vice-chairAnanth Kalyanaraman, Suren Byna. [doi]
- Scalable dual-instruction multiple-data processing on an efficient systolic-array architectureYuxi Tan, Riadh Ben Abdelhamid, Bingjie Guo, QiXiang Gao, Masaru Nishimura, Yoshiki Yamaguchi. 1 [doi]
- Message from the HCW 2024 Technical Program Committee Co-ChairsDhabaleswar K. Panda 0001, Hari Subramoni. 1 [doi]
- HCW 2024 Preface and Committee ListAnne C. Elster, Jan Christian Meyer. 1 [doi]
- Toward a Holistic Performance Evaluation of Large Language Models Across Diverse AI AcceleratorsMurali Emani, Sam Foreman, Varuni Sastry, Zhen Xie, Siddhisanket Raskar, William Arnold, Rajeev Thakur, Venkatram Vishwanath, Michael E. Papka, Sanjif Shanmugavelu, Darshan Gandhi, Hengyu Zhao, Dun Ma, Kiran Ranganath, Rick Weisner, Jiunn-Yeu Chen, Yuting Yang, Natalia Vassilieva, Bin C. Zhang, Sylvia Howland, Alexander Tsyplikhin. 1-10 [doi]
- HPC Systems with Reconfigurable Optical Networks: Performance and Energy Consumption ExplorationXianwei Cheng, Che-Yu Liu, Roberto Proietti, S. J. Ben Yoo. 1 [doi]
- Message from the HCW 2024 Steering Committee Co-ChairsBehrooz A. Shirazi, Kamesh Madduri. 2 [doi]
- Message from the HCW 2024 General Co-ChairsAnne C. Elster, Jan Christian Meyer. 3 [doi]
- Message from the HCW 2024 Technical Program Committee Co-ChairsDhabaleswar K. Panda 0001, Hari Subramoni. 4 [doi]
- HCW 2024 Keynote: Hetero: Where we've been, Where we are, and What Next?Yale N. Patt. 5 [doi]
- Hetero: Where we've been, Where we are, and What Next?Yale N. Patt. 5 [doi]
- Performance Portability of the Chapel Language on Heterogeneous ArchitecturesJosh Milthorpe, Xianghao Wang, Ahmad Azizi. 6-13 [doi]
- Towards Dynamic Autotuning of SpMV in CUSP LibraryMiroslav Demek, Jiri Filipovic. 14-22 [doi]
- A Runtime Manager Integrated Emulation Environment for Heterogeneous SoC Design with RISC-V CoresH. Umut Suluhan, Serhan Gener, Alexander Fusco, Joshua Mack, Ismet Dagli, Mehmet E. Belviranli, Çagatay Edemen, Ali Akoglu. 23-30 [doi]
- Dynamic Tasks Scheduling with Multiple Priorities on Heterogeneous Computing SystemsHayfa Tayeb, Bérenger Bramas, Mathieu Faverge, Abdou Guermouche. 31-40 [doi]
- PSyGS Gen A Generator of Domain-Specific Architectures to Accelerate Sparse Linear System ResolutionNiccolo Nicolosi, Francesco Renato Negri, Francesco Pesce, Francesco Peverelli, Davide Conficconi, Marco Domenico Santambrogio. 41-47 [doi]
- IRIS: Exploring Performance Scaling of the Intelligent Runtime System and its Dynamic Scheduling PoliciesBeau Johnston, Narasinga Rao Miniskar, Aaron R. Young, Mohammad Alaul Haque Monil, Seyong Lee, Jeffrey S. Vetter. 58-67 [doi]
- Heterogeneous HyperthreadingMingxuan He, Fangping Liu, Sang Wook Stephen Do. 68-78 [doi]
- 31st Reconfigurable Architectures Workshop (RAW 2024)Jürgen Becker 0001, Zhenman Fang, Viktor K. Prasanna, Marco D. Santambrogio, Ramachandran Vaidyanathan. 79 [doi]
- RAW 2024 CommitteesMarco Domenico Santambrogio. 80-81 [doi]
- RAW 2024 Monday KeynoteDeming Chen. 82 [doi]
- RAW 2024 Invited Talk-1: Auto-Generating Diverse Heterogeneous DesignsWayne Luk. 83 [doi]
- RAW 2024 Invited Talk-2: Digital In-Memory Computing to Accelerate Deep Learning Inference on the EdgeStefania Perri. 84 [doi]
- RAW 2024 Invited Talk-3: Self-aware Reliable and Reconfigurable Computing Systems - An OverviewDiana Göhringer. 85 [doi]
- RAW 2024 Invited Talk-4: Reconfigurable Computing: Quo Vadis?Dirk Stroobandt. 86 [doi]
- RAW 2024 Invited Talk-5Masato Motomura. 87 [doi]
- RAW 2024 Invited Talk-6: Reconfigurable Architectures for High-Performance ComputingKentaro Sano. 88 [doi]
- RAW 2024 Invited Talk-7Wei Zhang. 89 [doi]
- RAW 2024 Invited Talk-8: Practical Reconfigurable Computing for Next-Generation Edge ApplicationsHayden Kwok-Hay So. 90 [doi]
- RAW 2024 Invited Talk-9: Riallto: An Open-Source Exploratory Framework for Ryzen AI™Andrew Schmidt. 91 [doi]
- FPGA Acceleration of DL-Based Real-Time DC Series Arc Fault DetectionYufei Mao, Roland Weiss, Yi Zhang, Yu Li, Marc Rothmann, Mario Porrmann. 92-98 [doi]
- An Accurate Union Find Decoder for Quantum Error Correction on the Toric CodeFederico Valentino, Beatrice Branchini, Davide Conficconi, Donatella Sciuto, Marco D. Santambrogio. 99-105 [doi]
- Towards the Acceleration of the Sparse Blossom Algorithm for Quantum Error CorrectionMarco Venere, Valentino Guerrini, Beatrice Branchini, Davide Conficconi, Donatella Sciuto, Marco D. Santambrogio. 106-110 [doi]
- Exploring Large Language Models for Verilog Hardware Design GenerationErik H. D'Hollander, Ewout Danneels, Karel-Brecht Decorte, Senne Loobuyck, Arne Vanheule, Ian Van Kets, Dirk Stroobandt. 111-115 [doi]
- Auto-Generating Diverse Heterogeneous DesignsJessica Vandebon, Jose G. F. Coutinho, Wayne Luk. 116-123 [doi]
- Self-Aware Reliable and Reconfigurable Computing Systems - An OverviewDiana Göhringer, Ariel Podlubne, Fabian Vargas 0001, Milos Krstic. 124-129 [doi]
- Digital In-Memory Computing to Accelerate Deep Learning Inference on the EdgeStefania Perri, Cristian Zambelli, Daniele Ielmini, Cristina Silvano. 130-133 [doi]
- A Fast Scalable Hardware Priority Queue and Optimizations for Multi-PushesSamuel Collinson, Allan Bai, Oliver Sinnen. 134-140 [doi]
- FPGA-based Implementation for Industrial Motion Control SystemClaudio Rubattu, Antonio Ledda, Francesco Ratto, Chaitanya Jugade, Dip Goswami, Francesca Palumbo. 141-147 [doi]
- An FPGA-Based Accelerator for Graph Embedding using Sequential Training AlgorithmKazuki Sunaga, Keisuke Sugiura, Hiroki Matsutani. 148-154 [doi]
- TaPaS Co-AIE: An Open-Source Framework for Streaming-Based Heterogeneous Acceleration Using AMD AI EnginesCarsten Heinz, Torben Kalkhof, Yannick Lavan, Andreas Koch 0001. 155-161 [doi]
- An Architectural Template for FPGA Overlays Targeting Data Flow ApplicationsAnna Drewes, Vitalii Burtsev, Bala Gurumurthy, Martin Wilhelm, David Broneske, Gunter Saake, Thilo Pionteck. 162-168 [doi]
- Performance Evaluation of VirtIO Device Drivers for Host-FPGA PCIe CommunicationSahan Bandara, Ahmed Sanaullah, Zaid Tahir, Ulrich Drepper, Martin C. Herbordt. 169-176 [doi]
- Accelerating TinyML Inference on Microcontrollers Through Approximate KernelsGiorgos Armeniakos, Georgios Mentzos, Dimitrios Soudris. 177 [doi]
- A Case for Low Bitwidth Floating Point Arithmetic on FPGA for Transformer Based DNN InferenceJiajun Wu 0007, Mo Song, Jingmin Zhao, Hayden Kwok-Hay So. 178-185 [doi]
- Balancing Intra-Die and Inter-Die Placement Optimization in 2.5D FPGA ArchitecturesRaveena Raikar, Dirk Stroobandt. 187 [doi]
- ConvMap: Boosting Convolution Throughput on FPGAs with Efficient Resource MappingShubhayu Das, Nanditha P. Rao, Sharad Sinha. 189 [doi]
- A Reconfigurable Architecture of a Scalable, Ultrafast, Ultrasound, Delay-and-Sum BeamformerVasilis Kypriotis, Georgios Smaragdos, Pieter Kruizinga, Dimitrios Soudris, Christos Strydis. 190 [doi]
- ML-Based Real-Time Control at the Edge: An Approach Using hls4mlRui Shi, Seda Ogrenci, J. M. Arnold, J. R. Berlioz, Pierrick Hanlet, Kyle J. Hazelwood, M. A. Ibrahim, Han Liu, V. P. Nagaslaev, Aakaash Narayanan, D. J. Nicklaus, Jovan Mitrevski, Gauri Pradhan, A. L. Saewert, B. A. Schupbach, Kiyomi Seiya, Mattson Thieme, R. M. Thurman-Keup, N. V. Tran. 191 [doi]
- Network Adapter for Secure Networks-on-ChipJulian Haase, Nico Volkens, Diana Göhringer. 192 [doi]
- Multi-Core Multi-Rule VeBPF Firewall for Secure FPGA IoT Device DeploymentsZaid Tahir, Sahan Bandara, Martin C. Herbordt. 193 [doi]
- POCA: A PYNQ Offloaded Cryptographic Accelerator on Embedded FPGA-Based SystemsRoberto A. Bertolini, Filippo Carloni, Davide Conficconi, Marco Domenico Santambrogio. 194 [doi]
- APDCM 2024 Preface and Committee ListJacir Luiz Bordim, Koji Nakano. 195-196 [doi]
- APDCM 2024 Keynote TalkHiroki Ohtsuji. 197 [doi]
- Application of Network Calculus Models to Heterogeneous Streaming ApplicationsClayton J. Faber, Roger D. Chamberlain. 198-201 [doi]
- Data-Driven Locality-Aware Batch SchedulingMaxime Gonthier, Elisabeth Larsson, Loris Marchal, Carl Nettelblad, Samuel Thibault. 202-211 [doi]
- Combining Lossy Compression with Multi-Level Caching for Data Staging over NetworkRei Aoyagi, Keichi Takahashi, Yoichi Shimomura, Hiroyuki Takizawa. 212-221 [doi]
- A Scalable Secure Fault Tolerant Aggregation for P2P Federated LearningYujiro Yahata, Keisuke Sugiura, Hiroki Matsutani. 222-231 [doi]
- Accelerating BFT Database with Transaction ReconstructionAoi Kida, Hideyuki Kawashima. 232-241 [doi]
- Optimizing Aria Concurrency Control Protocol with Early Dependency ResolutionYusuke Miyazaki 0006, Takashi Hoshino 0002, Hideyuki Kawashima. 242-249 [doi]
- Shared-Memory Parallel Algorithms for Community Detection in Dynamic GraphsSubhajit Sahu, Kishore Kothapalli, Dip Sankar Banerjee. 250-259 [doi]
- A Probabilistic Model for Asynchronous Iterative MethodsPratik Nayak, Hartwig Anzt. 260-269 [doi]
- The Logarithmic Random Bidding for the Parallel Roulette Wheel Selection with Precise ProbabilitiesKoji Nakano. 270-272 [doi]
- Introduction to Computational Quantum Chemistry for Computer ScientistsYasuaki Ito, Satoki Tsuji, Haruto Fujii, Kanta Suzuki, Nobuya Yokogawa, Koji Nakano, Akihiko Kasagi. 273-282 [doi]
- AsHES 2024 Preface and Committee ListShintaro Iwasaki. 283-284 [doi]
- Block-based GPU Programming with TritonPhilippe Tillet. 285 [doi]
- Performance Versus Maintainability: A Case Study of Scream on FrontierJames B. White. 286-292 [doi]
- ParaGraph: Weighted Graph Representation for Performance Optimization of HPC KernelsAli TehraniJamsaz, Alok Mishra 0002, Akash Dutta, Abid M. Malik, Barbara M. Chapman, Ali Jannesari. 293-300 [doi]
- Alternative Quadrant Representations with Morton Index and AVX2 Vectorization for AMR Algorithms within the p4est Software LibraryMikhail Kirilin, Carsten Burstedde. 301-310 [doi]
- Avoiding Training in the Platform-Aware Optimization Process for Faster DNN Latency ReductionRaúl Marichal, Ernesto Dufrechou, Pablo Ezzatti. 311-320 [doi]
- A Comparative Study on Simulation Frameworks for AI Accelerator EvaluationChristoffer Åleskog, Håkan Grahn, Anton Borg. 321-328 [doi]
- Extending the SYCL Joint Matrix for Binarized Neural NetworksZheming Jin. 329-333 [doi]
- Message from the EduPar-24 Workshop ChairsSushil K. Prasad. 334 [doi]
- EduPar-24 Workshop OrganizationSushil K. Prasad. 335-336 [doi]
- EduPar 2024 Keynote SpeakerCharles E. Leiserson. 337 [doi]
- Helping Faculty Teach Software Performance EngineeringJohn D. Owens, Bruce Hoppe. 338-341 [doi]
- Parallel Optimization for Robotics: An Undergraduate Introduction to GPU Parallel Programming and Numerical Optimization ResearchBrian Plancher. 342-345 [doi]
- Teaching Parallel Algorithms Using the Binary-Forking ModelGuy E. Blelloch, Yan Gu 0001, Yihan Sun 0001. 346-351 [doi]
- Peachy Parallel Assignments (EduPar 2024)Alina Lazar, Ethan Scheelk, Elizabeth Shoop, David P. Bunde. 352-356 [doi]
- Codeless PDC Modules for Early Computing CurriculumChris Bourke, Justin W. Firestone. 357-364 [doi]
- Visualizing PRAM Algorithm for MergesortCade Wiley, Grey Ballard. 365-368 [doi]
- Integrating Interactive Performance Analysis in Jupyter Notebooks for Parallel Programming EducationLena Oden, Klaus Nölp, Philipp Brauner. 369-376 [doi]
- Interactive Textbooks for Parallel and Distributed Computing Across the Undergraduate CS CurriculumElizabeth Shoop, Richard A. Brown, Suzanne J. Matthews, Joel C. Adams. 377-384 [doi]
- Teaching Performance Metrics in Parallel Computing CoursesSandino Vargas Perez. 385-390 [doi]
- Speedcode: Software Performance Engineering Education via the Coding of Didactic ExercisesTim Kaler, Xuhao Chen 0001, Brian Wheatman, Dorothy Curtis, Bruce Hoppe, Tao B. Schardl, Charles E. Leiserson. 391-394 [doi]
- ESSA 2024 Message and CommitteesFrançois Tessier, Weikuan Yu. 395-396 [doi]
- The Impact of Asynchronous I/O in Checkpoint-Restart WorkloadsHariharan Devarajan, Adam Moody, Donglai Dai, Cameron Stanavige, Elsa Gonsiorowski, Marty McFadden, Olaf Faaland, Gregory Kosinovsky, Kathryn M. Mohror. 397-405 [doi]
- Benchmarking Variables for Checkpointing in HPC ApplicationsXiang Fu, Xin Huang, Wubiao Xu, Weiping Zhang, Shiman Meng, Luanzheng Guo, Kento Sato. 406-413 [doi]
- Extending the Mochi Methodology to Enable Dynamic HPC Data ServicesMatthieu Dorier, Philip H. Carns, Robert B. Ross, Shane Snyder, Robert Latham, Amal Gueroudji, George Amvrosiadis, Chuck Cranor, Jérome Soumagne. 414-422 [doi]
- Adaptive Per-File Lossless Compression of Floating-Point DataAndrew Rodriguez, Noushin Azami, Martin Burtscher. 423-430 [doi]
- Optimizing Forward Wavefield Storage Leveraging High-Speed Storage MediaJoão Speglich, Navjot Kukreja, George Bisbas, Átila Saraiva, Jan Hückelheim, Fabio Luporini, John Washbourne. 431-438 [doi]
- The Art of Sparsity: Mastering High-Dimensional Tensor StorageBin Dong 0002, Kesheng Wu, Suren Byna. 439-446 [doi]
- GrAPL 2024 Preface and CommitteesNesreen K. Ahmed, Manoj Kumar. 447-448 [doi]
- To Tile or not to Tile, That is the QuestionAltan Haan, Doru-Thom Popovici, Koushik Sen, Costin Iancu, Alvin Cheung. 449-458 [doi]
- Teaching Network Traffic Matrices in an Interactive Game EnvironmentChasen Milner, Hayden Jananthan, Jeremy Kepner, Vijay Gadepally, Michael Jones 0001, Peter Michaleas, Ritesh Patel, Sandeep Pisharody, Gabriel Wachman, Alex Pentland. 459-467 [doi]
- Characterizing the Performance of Emerging Deep Learning, Graph, and High Performance Computing Workloads Under InterferenceHao Xu, Shuang Song 0007, Ze Mao. 468-477 [doi]
- The GraphBLAS 3.0 ProjectRaye Kimmerer, Timothy G. Mattson, Scott McMillan, Benjamin Brock, Erik Welch, Michel Pelletier, José E. Moreira. 478-481 [doi]
- Edge-Parallel Graph Encoder EmbeddingAriel Lubonja, Cencheng Shen, Carey E. Priebe, Randal C. Burns. 482-485 [doi]
- Multi-Level GNN Preconditioner for Solving Large Scale ProblemsMatthieu Nastorg, Jean-Marc Gratien, Thibault Faney, Michele Alessandro Bucci, Guillaume Charpiat, Marc Schoenauer. 486-495 [doi]
- STGraph: A Framework for Temporal Graph Neural NetworksJoel Mathew Cherian, Nithin Puthalath Manoj, Kevin Jude Concessao, Unnikrishnan Cheramangalath. 496-505 [doi]
- GraphBinMatch: Graph-Based Similarity Learning for Cross-Language Binary and Source Code MatchingAli TehraniJamsaz, Hanze Chen, Ali Jannesari. 506-515 [doi]
- GraphBLAS.jl v0.1: An Update on GraphBLAS in JuliaRaye Kimmerer. 516-519 [doi]
- ECG: Expressing Locality and Prefetching for Optimal Caching in Graph StructuresAbdullah T. Mughrabi, Morteza Baradaran, Ahmed Samara, Kevin Skadron. 520-525 [doi]
- Unlocking the Potential: Performance Portability of Graph Algorithms on Kokkos FrameworkShaikh Arifuzzaman, Hasan S. Arikan, Md Abdul Motaleb Faysal, Maximilian H. Bremer, John Shalf, Doru Popovici. 526-529 [doi]
- Shared-Memory Parallel Edmonds Blossom Algorithm for Maximum Cardinality Matching in General GraphsGregory Schwing, Daniel Grosu, Loren Schwiebert. 530-539 [doi]
- HiCOMB 2024 Preface and CommitteesAlba Cristina Magalhaes Alves de Melo, Ananth Kalyanaraman. 540 [doi]
- Re-visiting the Third Pillar of Science for Synergistic (Bio)ComputingWu Feng. 541 [doi]
- Lessons Learned Designing Irregular Genomic Algorithms on Parallel Systems and ArchitecturesGiulia Guidi. 542 [doi]
- Empirical Study of Molecular Dynamics Workflow Data Movement: DYAD vs. Traditional I/O SystemsIan Lumsden, Hariharan Devarajan, Jack Marquez, Stephanie Brink, David Böhme, Olga Pearce, Jae-Seung Yeom, Michela Taufer. 543-553 [doi]
- ZSMILES: An Approach for Efficient SMILES Storage for Random Access in Virtual ScreeningGianmarco Accordi, Davide Gadioli, Giorgio Seguini, Andrea Rosario Beccari, Gianluca Palermo. 554-560 [doi]
- Further Optimizations and Analysis of Smith-Waterman with Vector ExtensionsReza Sajjadinasab, Hamed Rastaghi, Hafsah Shahzad, Sanjay Arora, Ulrich Drepper, Martin C. Herbordt. 561-570 [doi]
- High Performance Binding Affinity Prediction with a Transformer-Based Surrogate ModelArchit Vasan, Ozan Gökdemir, Alexander Brace, Arvind Ramanathan, Thomas S. Brettin, Rick Stevens, Venkatram Vishwanath. 571-580 [doi]
- PAISE 2024 Preface and CommitteesPete Beckman. 581-583 [doi]
- FrameFeedback: A Closed-Loop Control System for Dynamic Offloading Real-Time Edge InferenceMatthew Jackson, Bo Ji, Dimitrios S. Nikolopoulos. 584-591 [doi]
- A Converting Autoencoder Toward Low-latency and Energy-efficient DNN Inference at the EdgeHasanul Mahmud, Peng Kang, Kevin Desai, Palden Lama, Sushil K. Prasad. 592-599 [doi]
- PCM Enabled Low-Power Photonic Accelerator for Inference and Training on Edge DevicesJuliana Curry, Ahmed Louri, Avinash Karanth, Razvan C. Bunescu. 600-607 [doi]
- Towards Accelerating k-NN with MPI and Near-Memory ProcessingHooYoung Ahn, Seonyoung Kim, Yoo-Mi Park, Woojong Han, Nick Contini, Bharath Ramesh 0005, Mustafa Abduljabbar, Dhabaleswar K. Panda 0001. 608-615 [doi]
- CGRA4HPC 2024 Welcome Message and Committee ListArtur Podobas. 616-617 [doi]
- An Architecture-Agnostic Dataflow Mapping Framework on CGRAJiangnan Li, Yazhou Yan, Jingyuan Li, Shaoyang Sun, Boyin Jin, Wenbo Yin, Lingli Wang. 618-625 [doi]
- TransMap: An Efficient CGRA Mapping Framework via Transformer and Deep Reinforcement LearningJingyuan Li, Yuan Dai, Yihan Hu, Jiangnan Li, Wenbo Yin, Jun Tao 0001, Lingli Wang. 626-633 [doi]
- Comparative Analysis of Executing GPU Applications on FPGA: HLS vs. Soft GPU ApproachesChihyo Ahn, Shinnung Jeong, Liam Paul Cooper, Nicholas Parnenzini, Hyesoon Kim. 634-641 [doi]
- CGRA-ME 2.0: A Research Framework for Next-Generation CGRA Architectures and CADOmar Ragheb, Stephen Wicklund, Matthew Walker, Rami Beidas, Adham Ragab, Tianyi Yu, Jason Helge Anderson. 642-649 [doi]
- A Scalable Mapping Method for Elastic CGRAsMakoto Saito, Takuya Kojima, Hideki Takase, Hiroshi Nakamura. 650-657 [doi]
- GIM (Ghost In the Machine): A Coarse-Grained Reconfigurable Compute-In-Memory Platform for Exploring Machine-Learning ArchitecturesMaya Borowicz, James Ding, Winnie Fan, Zhongqi Gao, Davis Jackson, Ares Lu, Sophia Rohlfsen, Ray Simar. 658-663 [doi]
- HIPS 2024 Preface and CommitteesSeyong Lee, Lena Oden. 664-665 [doi]
- Architecture and Programming of Analog In-Memory-Computing Accelerators for Deep Neural NetworksHsinYu Sidney Tsai. 666 [doi]
- eCC++ : A Compiler Construction Framework for Embedded Domain-Specific LanguagesMarc González Tallada, Joel E. Denny, Pedro Valero-Lara, Seyong Lee, Keita Teranishi, Jeffrey S. Vetter. 667-677 [doi]
- Comprehensive Study for Just-In-Time Pack Functions in Open MPIYicheng Li, Joseph Schuchart, George Bosilca. 678-685 [doi]
- Dynamic Resource Management for Elastic Scientific Workflows using PMIxRajat Bhattarai, Howard Pritchard, Sheikh Ghafoor 0001. 686-695 [doi]
- GrOUT: Transparent Scale-Out to Overcome UVM's Oversubscription SlowdownsIan Di Dio Lavore, Davide Maffi, Marco Arnaboldi, Arnaud Delamare, Daniele Bonetta, Marco D. Santambrogio. 696-705 [doi]
- Towards Fine-Grained Parallelism in Parallel and Distributed Python LibrariesJamison Kerney, Ioan Raicu, John Raicu, Kyle Chard. 706-715 [doi]
- Automated Data Analysis for Defining Performance Metrics from Raw Hardware EventsDaniel Barry, Anthony Danalis, Jack J. Dongarra. 716-725 [doi]
- Performance Analysis of the NVIDIA HPC SDK and AMD AOCC Compilers in an HPC Cluster Using Pooled, Robust and Relative MetricsYectli A. Huerta. 726-737 [doi]
- 9th IEEE International Workshop on Automatic Performance Tuning (iWAPT 2024)Pedro Valero-Lara. 738-739 [doi]
- iWAPT 2024 Keynote Talk: What Happens to a Dream Deferred? Chasing Automatic Offloading in Fortran 2023Damian W. I. Rouson. 740 [doi]
- An Exploration of Global Optimization Strategies for Autotuning OpenMP-based CodesGregory Bolet, Giorgis Georgakoudis, Konstantinos Parasyris, Kirk W. Cameron, David Beckingsale, Todd Gamblin. 741-750 [doi]
- Communication-Computation Overlapping for Parallel Multigrid MethodsKengo Nakajima. 751-760 [doi]
- PML-MPI: A Pre-Trained ML Framework for Efficient Collective Algorithm Selection in MPIMingzhe Han, Goutham Kalikrishna Reddy Kuncham, Benjamin Michalowicz, Rahul Vaidya, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda. 761-770 [doi]
- Application-Agnostic Auto-Tuning of Open MPI Collectives Using Bayesian OptimizationEmmanuel Jeannot, Pierre Lemarinier, Guillaume Mercier, Sophie Robert-Hayek, Richard Sartori. 771-781 [doi]
- XAI-Based Feature Importance Analysis on Loop OptimizationToshinobu Katayama, Keichi Takahashi, Yoichi Shimomura, Hiroyuki Takizawa. 782-791 [doi]
- Cost-Effective Methodology for Complex Tuning Searches in HPC: Navigating Interdependencies and DimensionalityAdrián Pérez Diéguez, Min Choi, Mahmut Okyay, Mauro Del Ben, Bryan M. Wong, Khaled Z. Ibrahim. 792-801 [doi]
- 27th Workshop on Job Scheduling Strategies for Parallel Processing; (JSSPP 2024)Dalibor Klusácek, Julita Corbalán, Gonzalo P. Rodrigo. 802 [doi]
- ParSocial 2024 Welcome and Committee ListHien Nguyen, Jeremy E. Thompson. 803-804 [doi]
- RIMR: Reverse Influence Maximization RankJay Vap, Peter M. Kogge. 805-814 [doi]
- Distributed Multi-GPU Community Detection on Exascale Computing PlatformsNaw Safrin Sattar, Hao Lu 0001, Feiyi Wang, Mahantesh Halappanavar. 815-824 [doi]
- Lock-free Computation of PageRank in Dynamic GraphsSubhajit Sahu, Kishore Kothapalli, Hemalatha Eedi, Sathya Peri. 825-834 [doi]
- Proposal for a Flexible Benchmark for Agent Based ModelsElizabeth R. Koning, William Gropp. 835-838 [doi]
- Socio-Behavioral Influences in Epidemic Modeling: Towards a Unified FrameworkSuresh Subramanian, Vairavan Murugappan, Eunice E. Santos. 839-842 [doi]
- Towards Improved Uncertainty Quantification of Stochastic Epidemic Models Using Sequential Monte CarloArindam Fadikar, Abby Stevens, Nicholson T. Collier, Kok Ben Toh, Olga Morozova, Anna Hotton, Jared Clark, David Higdon, Jonathan Ozik. 843-852 [doi]
- Revolutionizing Personal Recommendations via Federated Contrastive Transformer LearningYoucef Djenouri, Fabio Augusto de Alcantara Andrade, Gautam Srivastava 0001, Ahmed Nabil Belbachir. 853-856 [doi]
- High-Speed Transcript Collection on Multimedia Platforms: Advancing Social Media Research through Parallel ProcessingMert Can Cakmak, Nitin Agarwal. 857-860 [doi]
- Parallelizing Accelerographic Records ProcessingRonaldo Canizales, Luis Mixco, Jedidiah McClurg. 861-869 [doi]
- GPU-Accelerated Tree-Search in Chapel Versus CUDA and HIPGuillaume Helbecque, Ezhilmathi Krishnasamy, Nouredine Melab, Pascal Bouvry. 872-879 [doi]
- Parallel Maximum Cardinality Matching for General Graphs on GPUsGregory Schwing, Daniel Grosu, Loren Schwiebert. 880-889 [doi]
- GPU-LSolve: An Efficient GPU-Based Laplacian Solver for Million-Scale GraphsSumiaya Dabeer, Amitabha Bagchi, Rahul Narain. 890-899 [doi]
- KOptim: Kubernetes Optimization FrameworkTarek Menouer, Christophe Cérin, Patrice Darmon. 900-908 [doi]
- Electric Drive Assignment Strategies Optimization for Plugin Hybrid Urban Buses on Tailored Emissions MappingJosé Miguel Aragón-Jurado, Marina Díaz-Jiménez, Bernabé Dorronsoro, Pablo Pavón-Domínguez, Marcin Seredynski, Patricia Ruiz. 909-918 [doi]
- DUST: Resource-Aware Telemetry Offloading with A Distributed Hardware-Agnostic ApproachMehrnaz Sharifian, Diman Zad Tootaghaj, Chen-Nee Chuah, Puneet Sharma. 919-928 [doi]
- Understanding Multi-Dimensional Efficiency of Fine-Tuning Large Language Models Using SpeedUp, MemoryUp, and EnergyUpDayuan Chen, Noe Soto, Jonas F. Tuttle, Ziliang Zong. 929-937 [doi]
- Compiler-Driven SWAR Parallelism for High-Performance Bitboard AlgorithmsFlorian Fey, Sergei Gorlatch. 938-946 [doi]
- Multiobjective Based Strategy for Neural Architecture Search for Segmentation TaskAbass Sana, Kaoutar Senhaji, Amir Nakib. 947-955 [doi]
- A Mathematical Model and a Convergence Result for Totally Asynchronous Federated LearningDidier El Baz, Jia Luo 0001, Hao Mo, Lei Shi. 956-963 [doi]
- State-Space Search to Find Energy-Aware Pareto-Efficient Optimal Task SchedulesYasith Udagedara, Andrea Raith, Oliver Sinnen. 964-973 [doi]
- Message from the PDSEC-24 Workshop ChairsSabine Roller, George Bosilca, Raphaël Couturier, Neda Ebrahimi Pour, Jean-Claude Charr, Thomas Rauber, Gudula Rünger, Laurence T. Yang. 974-975 [doi]
- Multi-Criteria Mesh Partitioning for an Explicit Temporal Adaptive Task-Distributed Finite-Volume SolverAlice Lasserre, Jean Marie Couteyen Carpaye, Abdou Guermouche, Raymond Namyst. 976-985 [doi]
- Graph Computing on Long Vector Architectures (Yes, It Works!)Pablo Vizcaino, Jesús Labarta, Filippo Mantovani. 986-995 [doi]
- Integration of Modern HPC Performance Tools in Vlasiator for Exascale Analysis and OptimizationCamille Coti, Yann Pfau-Kempf, Markus Battarbee, Urs Ganse, Sameer Shende, Kevin A. Huck, Jordi Rodriquez, Leo Kotipalo, Jennifer Faj, Jeremy J. Williams, Ivy Peng, Allen D. Malony, Stefano Markidis, Minna Palmroth. 996-1005 [doi]
- Performance Portability of Generated Cardiac Simulation Kernels Through Automatic Dimensioning and Load Balancing on Heterogeneous NodesVincent Alba, Olivier Aumage, Denis Barthou, Raphaël Colin, Marie Christine Counilh, Stéphane Genaud, Amina Guermouche, Vincent Loechner, Arun Thangamani. 1006-1015 [doi]
- A Parallel Workflow for Polar Sea-Ice Classification Using Auto-Labeling of Sentinel-2 ImageryJurdana Masuma Iqrah, Wei Wang 0149, Hongjie Xie, Sushil K. Prasad. 1016-1025 [doi]
- Automated Calibration of Parallel and Distributed Computing Simulators: A Case StudyJesse McDonald, Maximilian Horzela, Frédéric Suter, Henri Casanova. 1026-1035 [doi]
- Pretraining Billion-Scale Geospatial Foundational Models on FrontierAristeidis Tsaris, Philipe Ambrozio Dias, Abhishek Potnis, Junqi Yin, Feiyi Wang, Dalton D. Lunga. 1036-1046 [doi]
- Scaling Ensembles of Data-Intensive Quantum Chemical Calculations for Millions of MoleculesKshitij Mehta, Massimiliano Lupo Pasini, Stephan Irle, Pilsun Yoo, Frédéric Suter, Dmitry Ganyushin, Scott Klasky. 1047-1056 [doi]
- Accelerating Quantum Light-Matter Dynamics on Graphics Processing UnitsTaufeq Mohammed Razakh, Thomas Linker, Ye Luo, Rajiv K. Kalia, Ken-ichi Nomura, Priya Vashishta, Aiichiro Nakano. 1057-1066 [doi]
- Q-CASA 2024 Preface and Committee ListAshfaq Khokhar, Mary Eshaghian-Wilner, Robert Basili. 1067 [doi]
- Quantifying Performance of Wire-Based Quantum Circuit Cutting with EntanglementsShiplu Sarker, Wenyang Qian, Soham Pal, Robert Basili, Mary Eshaghian-Wilner, Ashfaq Khokhar, Glenn R. Luecke, James P. Vary. 1068-1077 [doi]
- Parallel Quantum Circuit Extraction from MBQC-PatternsMarcel Quanz, Korbinian Staudacher, Karl Fürlinger. 1078-1087 [doi]
- Hybrid Classical-Quantum Simulation of MaxCut using QAOA-in-QAOAAniello Esposito, Tamuz Danzig. 1088-1094 [doi]
- A Delay-Efficient Implementation of Quantum Carry Select AddersAnnalisa Massini, Federico Mingardi. 1095-1104 [doi]
- Quantum Circuit Mapping Using Binary Integer Nonlinear ProgrammingAaron Orenstein, Vipin Chaudhary. 1105-1114 [doi]
- Measurement-Based Quantum Approximate OptimizationTobias Stollenwerk, Stuart Hadfield. 1115-1127 [doi]
- Image Compression and Reconstruction Based on Quantum NetworkJ. Xun, Qin Liu, Shan Huang, Andi Chen, W. Shengjun. 1128-1135 [doi]
- Cutting a Wire with Non-Maximally Entangled StatesMarvin Bechtold, Johanna Barzen, Frank Leymann, Alexander Mandl. 1136-1145 [doi]
- A Deep Dive into Task-Based Parallelism in PythonWilliam Ruys, Hochan Lee, Bozhi You, Shreya Talati, Jaeyoung Park, James Almgren-Bell, Yineng Yan, Milinda Fernando, George Biros, Mattan Erez, Martin Burtscher, Christopher J. Rossbach, Keshav Pingali, Milos Gligoric 0001. 1147-1149 [doi]
- A New Exact State Reconstruction Strategy for Conjugate Gradient Methods with Arbitrary PreconditionersViktoria Mayer, Wilfried N. Gansterer. 1150-1152 [doi]
- A Stochastic Composite Model to Understand the Impact of Rare, Colossal Interference in HPC SystemsMuna Tageldin, Majeed M. Hayat, Jered Dominguez-Trujillo, Patrick G. Bridges. 1153-1155 [doi]
- Accelerating Native Transaction Processing in LSM-Based Persistent Key-Value StoresJin Xue, Zili Shao. 1156-1158 [doi]
- AdCoalescer: An Adaptive Coalescer to Reduce the Inter-Module Traffic in MCM-GPUsXu Zhang, Guangda Zhang, Lu Wang, Xia Zhao. 1159-1160 [doi]
- An SR-IOV SSD Optimized for QoS-Sensitive IaaS Cloud StorageXiang Chen, Ru Ying, Haocong Ma, Yao Wang, Xianjun Meng, Guangjun Xie, Yonghui Zhan, Fenyong Yuan, Ying Yang, Ying Yang, Tao Lu, Jinqiang Wang, You Zhou, Fei Wu 0005. 1161-1163 [doi]
- Asynchrony and Failure Masking via Pseudo-Local Process Recovery in MPI ApplicationsMathhew Whitlock, Hemanth Kolla, Aurelien Bouteiller, Jackson R. Mayo, Nicolas M. Morales, Keita Teranishi, George Bosilca. 1164-1166 [doi]
- EDDIS: Accelerating Distributed Data-Parallel DNN Training for Heterogeneous GPU ClusterShinyoung Ahn, HooYoung Ahn, Hyeonseong Choi, JaeHyun Lee. 1167-1168 [doi]
- Efficient Multi-Processor Scheduling in Increasingly Realistic Models (Brief Summary)Pál András Papp, Georg Anegg, Aikaterini Karanasiou, Albert-Jan Nicholas Yzelman. 1169-1171 [doi]
- Energy-Aware Decentralized Learning with Intermittent Model TrainingMartijn de Vos, Akash Dhasade, Paolo Dini, Elia Guerra, Anne-Marie Kermarrec, Marco Miozzo, Rafael Pires 0001, Rishi Sharma 0001. 1172-1174 [doi]
- Enhancing Energy Efficiency with Multi-Site Scheduling StrategiesAlok Kamatar, Valérie Hayot-Sasson, Yadu N. Babuji, André Bauer 0001, Gourav Rattihalli, Ninad Hogade, Dejan S. Milojicic, Kyle Chard, Ian T. Foster. 1175-1177 [doi]
- Evaluation of Programming Models and Performance for Stencil Computation on GPGPUsBaodi Shan, Mauricio Araya-Polo. 1178-1180 [doi]
- Exploiting Tensor Cores in Sparse Matrix-Multivector Multiplication via Block-Sparsity-Aware ClusteringEunji Lee, Yoonsang Han, Gordon Euhyun Moon. 1181-1183 [doi]
- FedClust: Optimizing Federated Learning on Non-IID Data Through Weight-Driven Client ClusteringMd Sirajul Islam, Simin Javaherian, Fei Xu, Xu Yuan 0001, Li Chen 0019, Nian-Feng Tzeng. 1184-1186 [doi]
- FedSZ: Leveraging Error-Bounded Lossy Compression for Federated Learning CommunicationsGrant Wilkins, Sheng Di, Jon C. Calhoun, Zilinghan Li, Kibaek Kim, Robert Underwood, Richard Mortier, Franck Cappello. 1187-1188 [doi]
- Integration Framework for Online Thread Throttling with Thread and Page Mapping on NUMA SystemsJanaina Schwarzrock, Arthur Francisco Lorenzon, Samuel Xavier de Souza, Antonio Carlos Schneider Beck. 1189-1192 [doi]
- MDLoader: A Hybrid Model-driven Data Loader for Distributed Deep Neural Networks TrainingJonghyun Bae, Jong Youl Choi, Massimiliano Lupo Pasini, Kshitij Mehta, Khaled Z. Ibrahim. 1193-1195 [doi]
- Proactive, Accuracy-aware Straggler Mitigation in Machine Learning ClustersSuraiya Tairin, Haiying Shen, Anand Iyer. 1196-1198 [doi]
- Scalable Node Embedding Algorithms Using Distributed Sparse Matrix OperationsIsuru Ranawaka, Ariful Azad. 1199-1201 [doi]
- Scheduling and Allocation of Disaggregated Memory Resources in HPC SystemsJie Li 0057, George Michelogiannakis, Brandon Cook 0001, John Shalf, Yong Chen 0001. 1202-1203 [doi]
- Shared-Memory Parallel Dynamic Louvain Algorithm for Community DetectionSubhajit Sahu, Kishore Kothapalli, Dip Sankar Banerjee. 1204-1205 [doi]
- System Optimizations for Enabling Training of Extreme Long Sequence Transformer ModelsSam Ade Jacobs, Masahiro Tanaka, Chengming Zhang 0006, Minjia Zhang, Reza Yazdani Aminabadi, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He. 1206-1208 [doi]
- Toward Self-Adjusting $k$-Ary Search Tree NetworksEvgeniy Feder, Anton Paramonov, Pavel Mavrin, Iosif Salem, Stefan Schmid 0001, Vitaly Aksenov. 1209-1211 [doi]
- Understanding Different Transport Coexistence in Datacenter NetworksDinghuang Hu, Dezun Dong. 1212-1213 [doi]
- IPDPS 2024 PhD ForumSanmukh Kuppannagari, Tanwi Mallick. 1214-1233 [doi]