Abstract is missing.
- Performance Characterization of Parallel Combination Generators on CPU and GPU SystemsBrian Donnelly, Michael Gowanlock. 4-14 [doi]
- Slaying a Life: Optimizing GPU-accelerated Game of Life StencilMatyás Brabec, Jirí Klepl, Martin Krulis. 15-24 [doi]
- Science per Dollar: Modeling Emerging Node Architectures for Accelerator-centric ComputingJordan M. Abt, Ali Farazdaghi, Elizabeth Reid 0002, Curtis Shorts, Tooraj Taraz, Zachary Silva, Ethan Shama, Scott Levy, Whit Schonbein, Matthew G. F. Dosanjh, Amirreza Barati Sedeh, Ryan E. Grant. 25-34 [doi]
- Heterogeneity-Aware Software Performance Characterization via Graph Machine LearningRonaldo Canizales, Jedidiah McClurg. 35-44 [doi]
- Apple vs. Oranges: Evaluating the Apple Silicon M-Series SoCs for HPC Performance and EfficiencyPaul Hübner, Andong Hu, Ivy Peng, Stefano Markidis. 45-54 [doi]
- ARTEMIS: Adaptive Real-Time Task Execution & Management in Heterogeneous SystemsTom Springer, Peiyi Zhao, Robert Alexander, Thomas Jordan. 55-58 [doi]
- Exploring NCCL Tuning Strategies for Distributed Deep LearningMajid Salimi Beni, Ruben Laso, Biagio Cosenza, Siegfried Benkner, Sascha Hunold. 59-62 [doi]
- Collaborative Bandwidth-Efficient Intra-Node AllreduceAmir Hossein Sojoodi, Ali Farazdaghi, Hamed Sharifian, Ryan E. Grant, Ahmad Afsahi. 63-67 [doi]
- nbshmem: Enabling GPU-Initiated Multi-GPU Communication in PythonCalvin Bombis, Lena Oden. 68-77 [doi]
- Towards an Efficient Containerized Cloud Gaming PlatformAdrien Gegout, Djob Mvondo, Davide Frey, Pascal Monchon. 78-86 [doi]
- PRNGine: Massively Parallel Pseudo-Random Number Generation and Probability Distribution Approximations on AMD AI EnginesMohamed Bouaziz 0002, Suhaib A. Fahmy. 91-98 [doi]
- Benchmarking Floating Point Performance of Massively Parallel Dataflow Overlays on AMD Versal Compute PrimitivesMohamed Bouaziz 0002, Suhaib A. Fahmy. 99-103 [doi]
- Enabling Manual-Controllable Compilation for Dataflow CGRAsFelix Böseler, Jörg Walter 0001, Verena Klös. 104-111 [doi]
- A Decoupled Coarse-Grained Reconfigurable Architecture by Introducing Data Flow Management UnitHisako Ito, Takuya Kojima, Hideki Takase, Hiroshi Nakamura. 112-119 [doi]
- RAAP-CGRA: Placement for CGRAs with Restricted Routing ArchitecturesAnh Nguyen, Sebastian Czyrny, Takahide Yoshikawa, Jason Helge Anderson. 120-126 [doi]
- Serverless IoT FrameworkIsaac David Núñez Araya, Michael Gerndt, Shajulin Benedict. 130-139 [doi]
- Advances in Semantic Patching for HPC-oriented Refactorings with CoccinelleMichele Martone, Julia Lawall. 140-148 [doi]
- SpMM-Bench: Performance Characterization of Sparse Formats for Sparse-Dense Matrix MultiplicationPatrick J. Flynn, Xinyao Yi, Erik Saule, Gokcen Kestor, Yonghong Yan 0001. 149-158 [doi]
- The Case for ABI Interoperability in a Fault Tolerant MPIYao Xu, Grace Nansamba, Anthony Skjellum, Gene Cooperman. 159-167 [doi]
- Exploring Communication Anomalies in ChapelRaneem Abu Yosef, Bokyeong Yoon, Martin Kong. 168-177 [doi]
- Implementing Directive-Based Deferred Execution for Effective Network AggregationAaron Welch, Oscar R. Hernandez, Stephen W. Poole, Wendy Poole. 178-186 [doi]
- Data Transfer Schemes in the High-Level Communication Library LAIKJosef Weidendorfer, Lukas Neef, Robert Hubinger, Amir Raoofy. 187-196 [doi]
- SYCL for HPC: Adapting to Diverse CPU ArchitectureAshish Bisht, Aniket P. Garade, Deepika H. V, Haribabu P, S. A. Kumar, S. D. Sudarsan. 197-204 [doi]
- LibraryX-ASIC: A First LookSanil Rao, Larry Tang 0003, Franz Franchetti. 205-208 [doi]
- Gen-AI in a Bottle: Experiments with LLMs to Generate HPC KernelsUpasana Sridhar, Elliott Binder, Tze Meng Low. 215-224 [doi]
- Predicting Performance VariabilityMohammed Baydoun, Mohammad Sonji, Pedro Bruel, Dejan S. Milojicic, Eitan Frachtenberg, Izzat El Hajj. 225-234 [doi]
- A Blockchain-Enabled Framework for Storage and Retrieval of Social DataAishwarya Parab, Prakhar Pradhan, Yogesh Simmhan, Arnab K. Paul. 237-240 [doi]
- Monitoring Digital Wildfires: a Large-Scale Dataset of COVID-19 Conspiracy Tweets Created via Fast NLP Inference using the Graphcore IPURohullah Akbari, Daniel Thilo Schroeder, Petra Filkuková, Johannes Langguth. 241-250 [doi]
- Ocularone-Bench: Benchmarking DNN Models on GPUs to Assist the Visually ImpairedSuman Raj, Bhavani A Madhabhavi, Kautuk Astu, Arnav A Rajesh, Pratham M, Yogesh Simmhan. 251-254 [doi]
- Predicting the Predictor: Linear Metamodeling for Evolving User Response PredictionRoman Wiatr, Renata G. Slota. 255-264 [doi]
- Towards community-based influence spread prediction (CIP) for edge changes in large-scale dynamic social networksPranav Pamidighantam, Vairavan Murugappan, Suresh Subramanian, Eunice E. Santos. 265-274 [doi]
- Rapid Random Packing of Poly-disperse Spheres using Adam Stochastic OptimizationMykhailo Novikov, Xavier Besseron. 277-286 [doi]
- A Compressed QUBO Format for Traveling Salesperson ProblemsChu-Yuan Huang, Kazuhiko Komatsu, Makoto Onoda, Masahito Kumagai, Masayuki Sato 0001, Hiroaki Kobayashi. 287-296 [doi]
- Reusable Object-Oriented Parallelization of Branch-and-Bound AlgorithmsByron DeVries, Christian Trefftz. 297-306 [doi]
- Parallel Fractal Decomposition Optimization Algorithms on Heterogeneous ArchitecturesMahmoud El Mehdi El Khadiri, El-Ghazali Talbi. 307-315 [doi]
- Enhancing Cluster Scheduling in HPC: A Continuous Transfer Learning for Real-Time OptimizationLeszek Sliwko, Jolanta Mizera-Pietraszko. 316-325 [doi]
- Dynamic configuration of Kubernetes containers resources with SLA classesTarek Menouer, Patrice Darmon, Christophe Cérin, Jonathan Rivalan. 326-332 [doi]
- Enhancing Generalization in Video Anomaly Detection through Multimodal Data MixingBohdan Ivaniuk-Skulskyi, Nadiya Shvai, Amir Nakib, El-Ghazali Talbi. 333-342 [doi]
- Efficient Privacy-Preserving Convolutional Neural Networks with CKKS-RNS for Encrypted Image ClassificationAndrei Tchernykh, Marianne Salgado-Ramos, Bernardo Pulido-Gaytan, Horacio González-Vélez, Esteban Mosckos, Mikhail G. Babenko. 343-352 [doi]
- A Header-Based C++ Library for Computing Hessian on GPU using Automatic DifferentiationDesh Ranjan, Mohammad Zubair. 355-364 [doi]
- Assembly of FETI dual operator using CUDAJakub Homola, Radim Vavrík, Ondrej Meca, Tomás Brzobohatý, Lubomír Ríha. 365-374 [doi]
- Block Epsilon-Circulant Preconditioning with GPU-Accelerated Spatial Solvers for Linear Time-Dependent PDEsRyo Yoda, Matthias Bolten. 375-384 [doi]
- A Simple Tiled Approach to Teaching Parallel ComputingPeter E. Strazdins. 385-394 [doi]
- ν-LPA: Fast GPU-based Label Propagation Algorithm (LPA) for Community DetectionSubhajit Sahu, Mahen N, Kishore Kothapalli. 395-404 [doi]
- Embracing Load Imbalance for Energy Optimizations: a Case-StudyJelle van Dijk, Gábor Závodszky, Ana Lucia Varbanescu, Andy D. Pimentel. 405-412 [doi]
- Scalable Higher Resolution Polar Sea Ice Classification and Freeboard Calculation from ICESat-2 ATL03 DataJurdana Masuma Iqrah, Young Hyun Koo, Wei Wang 0149, Hongjie Xie, Sushil K. Prasad. 413-422 [doi]
- SkePU-DNN: Algorithmic Skeleton Programming for Deep Learning on Heterogeneous SystemsSehrish Qummar, August Ernstsson, Christoph W. Kessler, Oleg Sysoev. 423-432 [doi]
- Adaptive Sketching Based Construction of H2 Matrices on GPUsWajih Halim Boukaram, Yang Liu 0179, Pieter Ghysels, Xiaoye Sherry Li. 433-442 [doi]
- In-Situ Auto-Regressive Surrogate Modeling for Feature Extraction Using AscentKewei Yan, Yonghong Yan 0001. 443-452 [doi]
- Using Checkpoint Alteration to Gauge Fault Sensitivity of HPC Scientific ApplicationsElvis Rojas, Luis Carlos N. Todd, Esteban Meneses. 453-462 [doi]
- Thalassa: Transforming Symbolic PDEs into Tensor-Based Solvers Running on ML AcceleratorsMichail Boulasikis, Flavius Gruian, Robert-Zoltán Szász. 463-472 [doi]
- A Parallel and Highly-Portable HPC Poisson Solver: Preconditioned Bi-CGSTAB with alpakaLuca Pennati, Måns I. Andersson, Klaus Steiniger, René Widera, Tapish Narwal, Michael Bussmann, Stefano Markidis. 473-483 [doi]
- QuIDS: A Large-Scale Distributed Framework for Quantum Irregular Dynamics SimulationsJoseph Touzet, Oguz Kaya, Pablo Arrighi, Amélia Durbec. 491-500 [doi]
- A mixed-precision quantum-classical algorithm for solving linear systemsOcéane Koska, Marc Baboulin, Arnaud Gazda. 501-508 [doi]
- Outlier Detection and other applications of Quantum Matrix MultiplicationGiacomo Antonioli, Alessandro Berti 0002, Alessandro Poggiali, Anna Bernasconi 0001, Gianna M. Del Corso. 509-518 [doi]
- Gate Efficient Composition of Hamiltonian Simulation and Block-Encoding with its Application on HUBO, Chemistry and Finite Difference MethodRobin Ollive, Stéphane Louise. 519-528 [doi]
- *Ashfaq A. Khokhar. 529 [doi]
- Parallel Processing for Distributed Machine Learning: A Taxonomy of Techniques and Associated Security RisksAbdulfatah Bahbouh, Ishfaq Ahmad, Hansheng Lei, Saif ul Islam. 533-542 [doi]
- Efficient Intra-node Hierarchical Parallelisms And Dynamic Load Balancing Strategies On Heterogeneous SystemsDikshant Pratap Singh, Mathialakan Thavappiragasam, Brice Videau. 543-552 [doi]
- Extending Microservices Performance Optimization Through Horizontal Pod Autoscaling: A Comprehensive StudyFernando H. L. Buzato, Alfredo Goldman. 553-562 [doi]
- Enhancing Productivity and Performance of HClib-Actor with Efficient Task TerminationYoussef Elmougy, Nirjhar Deb, Akihiro Hayashi, Vivek Sarkar. 563-567 [doi]
- Dynatune: Dynamic Tuning of Raft Election Parameters Using Network MeasurementKohya Shiozaki, Junya Nakamura 0001. 568-577 [doi]
- Pairbot: Enhancing Computational Capabilities by Pairing of Autonomous Mobile RobotsYonghwan Kim 0001, Yoshiaki Katayama, Koichi Wada 0001. 578-587 [doi]
- Towards a Fast and Generalizable Neural Inference Scheme for Tabular DataVictor Parque. 588-596 [doi]
- Performance Modeling of Non-Uniform Heterogeneous PlatformsSteven D. Harris, Roger D. Chamberlain, Christopher D. Gill. 597-607 [doi]
- RL-assisted Annealing for QUBO on a Multi-GPU SystemReo Gakumi, Ryota Yasudo. 608-617 [doi]
- CUBO-to-QUBO Conversion: Reducing Cubic Formulations to Quadratic FormulationsShunsuke Tsukiyama, Xiaotian Li, Koji Nakano, Victor Parque, Yasuaki Ito, Takumi Kato, Yuya Kawamata, Kaiki Ii. 618-625 [doi]
- QUBO++: A C++ Library for Developing and Solving QUBO ProblemsKoji Nakano, Shunsuke Tsukiyama, Xiaotian Li, Yasuaki Ito, Victor Parque, Takumi Kato, Yuya Kawamata, Kaiki Ii. 626-637 [doi]
- Teaching Accelerated Computing with Hands-on ExperienceIsil Öz, Chelsea Cropper. 642-649 [doi]
- Crash Course on Quantum Computing for Engineering StudentsLeonel Sousa. 650-657 [doi]
- A Visual Unplugged Activity to Introduce PDCMary L. Smith, Srishti Srivastava, David P. Bunde, April Renee Crockett, Michael C. Gerten, Peter Maher, Jaime Spacco, Xiaoyuan Suo, Jiayin Wang, Michelle Zhu. 658-665 [doi]
- SFS: A Simple File System for Teaching Parallelism in Computer SystemsBrian P. Railing, Lukas Kebuladze, Nathan Deyak, Zachary Weinberg. 666-672 [doi]
- Assessing Parallel and Distributed Computing Knowledge Through a Card GameSrishti Srivastava, Mary L. Smith. 673-679 [doi]
- High-Performance Computing for Graph AI: A Top-Down PerspectiveYuede Ji. 680-683 [doi]
- Visualizing MPI Collective CommunicationChristopher Atala, Meredith Morrison, Grey Ballard. 684-687 [doi]
- Experience using AI in MPI Test Suite Development: Implications for EducatorsCallie Stewart, Gerald C. Gannod. 688-691 [doi]
- Peachy Parallel Assignments (EduPar 2025)H. Martin Bücker, Johannes Schoder, Xiaoyuan Suo, David P. Bunde. 692-696 [doi]
- EduPar 2025 PostersSandra Catalán, Rocío Carratalá-Sáez, Vicente Lopez-Oliva, Katerina Michalickova, Shubbhi Taneja. 697-700 [doi]
- Checkpointing Optimisation to Prepare Future Exascale Plasma Turbulence SimulationsMéline Trochon, Julien Bigot, Virginie Grandgirard, Dorian Midou. 703-711 [doi]
- SCORPIO: A Parallel I/O library for Exascale Earth System ModelsJayesh Krishna, Danqing Wu, Robert L. Jacob, Dmitry Ganyushin. 712-721 [doi]
- Streamlining HDF5's AI Workloads BenchmarkingDlyaver Djebarov, Radita Liem, Sarah Neuwirth, Jean Luca Bez, Suren Byna. 722-730 [doi]
- IOPS: I/O Performance Evaluation SuiteMahamat Abdraman, Francieli Boito, Luan Teylo. 731-736 [doi]
- NAPEH: An Asynchronous and NUMA-Aware KV Store Based on Non-Volatile Memory ArchitecturesYili Ma, Shengquan Yin, Jing Xing, Haoquan Long, Zheng Wei, Guangming Tan, Dingwen Tao. 737-743 [doi]
- FGI: Fast GNN Inference on Multi-Core SystemsBinglin Ji, Chenfeng Zhao, Roger D. Chamberlain. 748-757 [doi]
- Divide, Conquer, and Match: A Distributed and Asynchronous Approach for Subgraph IsomorphismYoussef Elmougy, Akihiro Hayashi, Vivek Sarkar. 758-761 [doi]
- Enhanced Soups for Graph Neural NetworksJoseph Zuber, Aishwarya Sarkar, Joseph Jennings, Ali Jannesari. 762-771 [doi]
- Scaling Graph Neural Networks for Particle Track ReconstructionAlok Tripathy, Alina Lazar, Xiangyang Ju, Paolo Calafiura, Katherine A. Yelick, Aydin Buluç. 772-776 [doi]
- RaNT-Graph: A Scalable Approach to Sampling Billions of Walks or Paths from Weighted GraphsLance G. Fletcher, Trevor Steil, Roger Pearce. 777-786 [doi]
- Serverless Graph Analytics on Multi-Instance GPUMohammad Sonji, Mohammed Baydoun, Aditya Dhakal, Gourav Rattihalli, Dejan S. Milojicic, Izzat El Hajj. 787-794 [doi]
- On the Landscape of Graph Clustering at ScaleSaikat Dey, Sonal Jha, Frank Wanye, Wu-chun Feng. 795-804 [doi]
- Accelerating Triangle Counting with Real Processing-in-Memory SystemsLorenzo Asquini, Manos Frouzakis, Juan Gómez-Luna, Mohammad Sadrosadati, Onur Mutlu, Francesco Silvestri 0001. 805-814 [doi]
- Improving energy efficiency of HPC applications using unbalanced GPU power cappingAlbert d'Aviau de Piolant, Hayfa Tayeb, Bérenger Bramas, Mathieu Faverge, Abdou Guermouche, Amina Guermouche. 820-829 [doi]
- Methodology for GPU Frequency Switching Latency MeasurementDaniel Velicka, Ondrej Vysocky, Lubomir Riha. 830-839 [doi]
- LM-Offload: Performance Model-Guided Generative Inference of Large Language Models with Parallelism ControlJianbo Wu, Jie Ren 0015, Shuangyan Yang, Konstantinos Parasyris, Giorgis Georgakoudis, Ignacio Laguna, Dong Li 0001. 840-849 [doi]
- Millions of Matrix-Multiplications: GEMM Variations on AuroraColleen Bertoni, Thomas Applencourt, Longfei Gao, Ti Leggett. 850-856 [doi]
- Leveraging Interaction Between Memory Footprint and Parallelism Degree for efficient GPU PortingsMickaël Boichot, Adrien Roussel, Elisabeth Brunet, Patrick Carribault. 857-865 [doi]
- HaaS - A Platform for Password Cracking in Distributed Heterogeneous SystemsCarlos Lima, Rui Alves, José Rufino. 866-875 [doi]
- Static task mapping for heterogeneous systems based on series-parallel decompositionsMartin Wilhelm, Thilo Pionteck. 876-885 [doi]
- On the Usability and Energy Efficiency of High-Level Synthesis for FPGA-based Network-Attached AcceleratorsSteffen Christgau, Dylan Everingham, Max Lübke, Marco De Lucia, Danny Puhan, Niklas Schelten, Bettina Schnor, Hannes Signer, Johannes Spazier, Benno Stabernack, Fritjof Steinert, Serhii Yahdzhyiev. 886-895 [doi]
- Scheduling Strategies for Partially-Replicable Task Chains on Two Types of ResourcesDiane Orhan, Yacine Idouar, Laércio Lima Pilla, Adrien Cassagne, Denis Barthou, Christophe Jégo. 896-905 [doi]
- Heterogeneous Memory Pool TuningFilip Vaverka, Ondrej Vysocky, Lubomir Riha. 906-912 [doi]
- On the Singularity of SYCLAmi Marowka. 913-922 [doi]
- Proactive Endpoint Congestion Avoidance in UCCFerrol Aderholdt, Aamir Shafi, Manjunath Gorentla Venkata. 923-930 [doi]
- Development and Deployment of a Genomic Cancer Data Extraction Pipeline on the CloudEleni Adam, Terry Stilwell, Desh Ranjan, Harold Riethman. 935-938 [doi]
- Protein database search using Processing-in-Memory architectureCharly Airault, Charles Deltel, Florestan De Moor, Erwan Drezen, Meven Mognol, Dominique Lavenier. 939-948 [doi]
- ® Deep Learning Processor Unit for Accelerating Selective Sweep DetectionNikolaos Alachiotis 0001, Matthijs Leon Souilljee. 949-958 [doi]
- Scalable Runtime Architecture for Data-driven, Hybrid HPC and ML Workflow ApplicationsAndré Merzky, Mikhail Titov, Matteo Turilli, Ozgur O. Kilic, Tianle Wang 0001, Shantenu Jha. 962-969 [doi]
- Evaluating Expansion Memory for Optimizer State Offloading for Large Transformer ModelsMoiz Arif, Avinash Maurya, Sudharshan Vazhkudai, Bogdan Nicolae. 970-977 [doi]
- Is In-Context Learning Feasible for HPC Performance Autotuning?Thomas Randall, Akhilesh Bondapalli, Rong Ge 0002, Prasanna Balaprakash. 978-985 [doi]
- Exploration of LLM Lossless Compression on Scientific DataMax H. Faykus, Luanzheng Guo, Rizwan A. Ashraf, Jan Strube 0001, Jon C. Calhoun, Nathan R. Tallent. 986-990 [doi]
- Breaking Down LLM Inference: A preliminary performance analysis of sparsified transformersIoanna Tasou, Petros Anastasiadis, Panagiotis Mpakos, Dimitrios Galanopoulos, Nectarios Koziris, Georgios I. Goumas. 991-995 [doi]
- Imperfect Recognition: A Study of OCR Limitations in the Context of Scientific DocumentsChinmay Sahasrabudhe, Yang Ho, Nick Winovich, Sivasankaran Rajamanickam. 996-1002 [doi]
- Towards Orchestrating Agentic Applications as FaaS WorkflowsShiva Sai Krishna Anand Tokal, Vaibhav Jha, Anand Eswaran, Praveen Jayachandran, Yogesh Simmhan. 1003-1010 [doi]
- Adaptive Protein Design Protocols and MiddlewareAymen Alsaadi, Jonathan Ash, Mikhail Titov, Matteo Turilli, André Merzky, Shantenu Jha, Sagar Khare. 1011-1015 [doi]
- Online Learning Techniques for Occupancy Detection on Resource Constrained DevicesFlavio Renzi, Haoyu Ren, Alessio Bernardo, Giacomo Ziffer, Darko Anicic, Emanuele Della Valle. 1019-1026 [doi]
- Trade-Offs in Resource-Constrained Dimensionality Reduction AlgorithmsChristophe Cérin, Melvyn Chemin. 1027-1034 [doi]
- Investigating Efficient Edge Offloading Architectures for Serverless SystemsPaul Daniëlse, Hsiang-Ling Tai, Shashikant Ilager, Zhiming Zhao. 1035-1038 [doi]
- DEEP: Edge-Based Dataflow Processing with Hybrid Docker Hub and Regional RegistriesNarges Mehran, Zahra Najafabadi Samani, Reza Farahani, Josef Hammer, Dragi Kimovski. 1039-1042 [doi]
- Goal-Driven building automation using serverless computingSashko Ristov, Anna Meshcheriakova, Philipp Gritsch, Philipp Zech, Ruth Breu. 1043-1049 [doi]
- Towards Predicting Inference Latency of TinyML ModelsJyotishman Sarkar, Urmi Jana, Barnali Basak, Himadri Sekhar Paul, Swagata Biswas. 1050-1053 [doi]
- Towards Interpretable Energy Estimation for Edge AI ApplicationsRiccardo Cantini, Alessio Orsino, Domenico Talia, Paolo Trunfio. 1054-1057 [doi]
- Memory Efficient WebAssembly ContainersMatthijs Jansen, Maciej Kozub, Alexandru Iosup, Daniele Bonetta. 1058-1065 [doi]
- 6G Infrastructures for Edge AI: An Analytical PerspectiveKurt Horvath, Shpresa Tuda, Blerta Idrizi, Stojan Kitanov, Fisnik Doko, Dragi Kimovski. 1066-1072 [doi]
- Towards QoS-Aware Serverless Function Offloading in the Edge-Cloud Continuum through Reinforcement LearningGabriele Russo Russo, Pierpaolo Spaziani, Valeria Cardellini. 1073-1080 [doi]
- Dynamic and Forecast-Based Containers Autoscaling for Kubernetes with Reinforcement LearningAlfredo Lipari, Gabriele Proietti Mattia, Roberto Beraldi. 1081-1088 [doi]
- Blockchain consensus mechanisms for democratic voting environmentsThomas Auer, Kurt Horvath, Dragi Kimovski. 1089-1096 [doi]
- SDFLMQ: A Semi-Decentralized Federated Learning Framework over MQTTAmir Ali Pour, Julien Gascon-Samson. 1100-1107 [doi]
- Understanding the Performance and Power of LLM Inferencing on Edge AcceleratorsMayank Arya, Yogesh Simmhan. 1108-1111 [doi]
- Charon: An End-to-End Infrastructure for Connecting AI@Edge to HPCYongho Kim, Seongha Park, Swann Perarnau, Akhilesh Raj. 1112-1119 [doi]
- Edge AI in the computing continuum: Consistency and Availability at Early Design StagesVincenzo Barbuto, Claudio Savaglio, Giancarlo Fortino, Edward A. Lee. 1120-1127 [doi]
- Multi-Agent Reinforcement Learning for Workload Distribution in FaaS-Edge Computing SystemsEmanuele Petriglia, Federica Filippini, Michele Ciavotta, Marco Savi. 1128-1131 [doi]
- Software Container-based Energy Estimation Models for ARM ArchitectureMohamed Anisse Belhadj, Kods Trabelsi, Loïc Cudennec, Henri-Pierre Charles. 1132-1139 [doi]
- SIMD Acceleration of Matrix-Vector Operations on RISC-V for Variable Precision Neural NetworksGonzalo Salinas, Guilherme Sequeira, Alfonso Rodríguez 0002, João Bispo, Nuno Paulino 0001. 1140-1147 [doi]
- Optimizing Speech Emotion Recognition with Dynamic Dilation Rates for Efficient Edge DeploymentJin-Shyan Lee, Pin-Hsuan Lee. 1148-1155 [doi]
- Compositional Execution Motifs for Quantum-HPC SystemsNishant Saurabh, Pradeep Kumar Mantha, Shantenu Jha, André Luckow. 1157-1162 [doi]
- SFQ-Driven Pulse-Phase Sequence Generator for Superconducting Qubit ControlMeriam Gay Bautista-Jurney, Patricia Gonzalez-Guerrero, Anastasiia Butko. 1163-1169 [doi]
- A Recursive Approach to Representation in Hilbert Spaces of Increasing Dimension: Applications to Quantum-centric HPC tool developmentAlejandro Becerra, Abani K. Patra. 1170-1174 [doi]
- QCLAB: A Matlab Toolbox for Quantum ComputingSophia Keip, Daan Camps, Roel Van Beeumen. 1175-1181 [doi]
- Computational Speedup of Simulated Annealing with Nested Monte Carlo LoopKiyotaka Murashima. 1182-1187 [doi]
- QABE: a Framework for Quantum Annealer Programming and BenchmarkingGianluca Scanu, Marco Venere, Donatella Sciuto, Marco D. Santambrogio. 1188-1193 [doi]
- High Throughput Low Latency Network Intrusion Detection on FPGAs: A Raw Packet ApproachMuhammad Ali Farooq, Abid Rafique, Suhaib A. Fahmy, Aman Arora 0001. 1201-1207 [doi]
- Improving mapping of convolutional neural networks on FPGAs through tailored macro sizesBrindusa Mihaela Damian-Kosterhon, Andreas Koch 0001, Felix Kosterhon, Lucian Petrica. 1208-1215 [doi]
- An FPGA-Accelerated Framework for Optimizing Decision Tree Ensembles in Supervised LearningRodrigo Olmos, Andrés Otero. 1216 [doi]
- Accelerating CRS Format Conversion for Sparse Matrix Computation with an FPGATomoya Yokono, Yoshiki Yamaguchi. 1217 [doi]
- A Hardware/Software Co-Design Approach for Versal-Based K-means AccelerationEleonora Cabai, Giuseppe Sorrentino, Marco Domenico Santambrogio, Davide Conficconi. 1218 [doi]
- Towards a Methodology to Leverage Alveo Versal System Usability And ParallelizationFederico Mansutti, Davide Ettori, Giuseppe Sorrentino, Marco Domenico Santambrogio, Davide Conficconi. 1219 [doi]
- Security of Dynamically Reconfigurable RISC-V Systems: I/O Attack FocusAya Jendoubi, Jean-Christophe Prévotet, Philippe Tanguy, Pascal Cotret. 1220 [doi]
- A RISC-V Coprocessor for Seamless Integration of Stream-Based AcceleratorsRohan Krishna Vijayaraghavan, Ahmed Kamaleldin, Matthias Nickel, Diana Göhringer. 1221-1227 [doi]
- A 950 MHz SIMT Soft ProcessorMartin Langhammer, Gregg Baeckler, Kim Bozman. 1228-1235 [doi]
- Reconfigurable Processor-Centric Accelerators for Safety-Critical ApplicationsLuis Waucquez, Alfonso Rodriguez. 1236-1242 [doi]
- Edge SpAIce: Deep Learning Deployment Pipeline for Onboard Data Reduction on Satellite FPGAsNoemi D'Abbondanza, Stylianos Tzelepis, Nicolò Ghielmetti, Ioannis Kakogeorgiou, Vanya Buchova, Konstantinos Karantzalos, Katerina Kikaki, Nicolas-Marcel Lemoine, Maurizio Pierini, Sioni Summers, Simon Vellas, François de Vieilleville, Boyan-Nikola Zafirov. 1243-1249 [doi]
- Testbench analysis using non-invasive fault injectionYngve Hafting, Alexander Wold. 1250-1256 [doi]
- A Simulation-Based Framework to Reduce I/O Contention in HPCSimone Pernice, Ahmad Tarraf, Jean-Baptiste Besnard, Barbara Cantalupo, Alberto Cascajo, David E. Singh, Felix Wolf 0001, Jesús Carretero 0001, Sameer Shende, Marco Aldinucci. 1258-1260 [doi]
- Characterizing Spatial Data Traits for Modeling Generic Lossy Rate-Distortion QualityMd Hasanur Rahman 0001, Sheng Di, Guanpeng Li, Franck Cappello. 1261-1262 [doi]
- Efficient Parallel Scheduling for Sparse Triangular SolversToni Böhnlein, Pál András Papp, Raphael S. Steiner, Albert-Jan Nicholas Yzelman. 1263-1265 [doi]
- Energy Efficient Scheduling of AI/ML Workloads on Multi-Instance GPUs with Dynamic RepartitioningEllie Lipe, Neel Karia, Clifford Stein 0001, Connor Espenshade, Olivier Tardieu, Asser N. Tantawi. 1266-1268 [doi]
- Enhancing Graph Transformer Training through Adaptive Graph ParallelismJun-Liang Lin, Kamesh Madduri, Mahmut Taylan Kandemir. 1269-1270 [doi]
- Evaluation and Mitigation of Performance Variability of OpenMP Applications on Modern Multicore SystemsMinyu Cui, Miquel Pericàs. 1271-1273 [doi]
- Exploring Near-Optimal Contraction Strategies for the Scalar Product in the Tensor-Train FormatPrzemyslaw Dominikowski, Atte Torri, Brice Pointal, Oguz Kaya, Laércio Lima Pilla, Olivier Coulaud. 1274-1276 [doi]
- INSPIRIT: Adaptive Priority-based Task Scheduling for Heterogeneous HardwareYiqing Wang, Hailong Yang 0002, Xiaoyan Liu, Xinyu Yang, Pengbo Wang, Xin You, Qingxiao Sun, Mingzhen Li 0001, Yi Liu 0013, Zhongzhi Luan, Depei Qian. 1277-1279 [doi]
- IRISX: A Dynamic Trade-off System for Harnessing Heterogeneity for Performance PortabilitySanil Rao, Mohammad Alaul Haque Monil, Het Mankad, Narasinga Rao Miniskar, Keita Teranishi, Jeffrey S. Vetter, Franz Franchetti. 1280-1282 [doi]
- Lossy Parallel Visualization of Large-Scale Volume Data with Error-Bounded Image CompositingYongfeng Qiu, Yuxiao Li 0002, Xin Liang 0001, Yafan Huang, Guanpeng Li, Sheng Di, Franck Cappello, Hanqi Guo 0001. 1283-1285 [doi]
- MetaCast: Generalizing HPC Application Runtime PredictionSi Chen 0004, Simon Garcia De Gonzalo, Avani Wildani. 1286-1287 [doi]
- Parallel Scan on Ascend AI AcceleratorsBartlomiej Wróblewski 0001, Gioele Gottardo, Anastasios Zouzias. 1290-1292 [doi]
- Performance and Portability in Multi-GPU Branch-and-Bound: Chapel Versus CUDA and HIP for Tree-Based OptimizationIvan Tagliaferro, Guillaume Helbecque, Ezhilmathi Krishnasamy, Nouredine Melab, Grégoire Danoy. 1293-1295 [doi]
- Poster: A Scalable and Fault-Tolerant Decentralized Middleware for CI/CD WorkflowAmena Begum Farha, Abdullah Al-Mamun 0001, Gagan Agrawal. 1296-1297 [doi]
- Setchain Algorithms for Blockchain Scalability (Extended Abstract)Arivarasan Karmegam, Gabina Luz Bianchi, Margarita Capretto, Martín Ceresa, Antonio Fernández Anta, César Sánchez 0001. 1298-1300 [doi]
- Toward Efficient Asynchronous Single-Source Shortest PathMarco D'Antonio, Son Thai Mai, Hans Vandierendonck. 1301-1303 [doi]
- Toward Performance Prediction in Large-Scale Systems through Temporal System and Application Log AnalysisEhan Sohn, Changjong Kim, Alex Sim, Dong-Kyu Sung, Yongseok Son, Jisung Park 0001, Sunggon Kim. 1304-1306 [doi]
- Towards Efficient Instruction Stream Scheduling for Stencil Computation on ARM ProcessorsShanghao Liu, Hailong Yang 0002, Xin You, Zhongzhi Luan, Yi Liu 0013, Depei Qian. 1307-1310 [doi]
- TSUE: A Two-Stage Data Update Method for an Erasure Coded Cluster File SystemZheng Wei, Jing Xing, Yida Gu, Guangming Tan, Dingwen Tao. 1311-1313 [doi]
- Unlocking Energy-Efficient and High-Throughput Secure Data Communication in IoT with Memory-Centric ComputingJingyao Zhang, Elaheh Sadredini. 1314-1315 [doi]