Abstract is missing.
- DIPPM: A Deep Learning Inference Performance Predictive Model Using Graph Neural NetworksKarthick Panner Selvam, Mats Brorsson. 3-16 [doi]
- perun: Benchmarking Energy Consumption of High-Performance Computing ApplicationsJuan Pedro Gutiérrez H. Muriedas, Katharina Flügel, Charlotte Debus, Holger Obermaier, Achim Streit, Markus Götz. 17-31 [doi]
- Extending OpenSHMEM with Aggregation Support for Improved Message Rate PerformanceAaron Welch, Oscar R. Hernandez, Stephen W. Poole. 32-46 [doi]
- Fault-Aware Group-Collective Communication Creation and Repair in MPIRoberto Rocco, Gianluca Palermo. 47-61 [doi]
- MetaLive: Meta-Reinforcement Learning Based Collective Bitrate Adaptation for Multi-Party Live StreamingYi Yang, Xiang Li, Yeting Xu, Wenzhong Li, Jiangyi Hu, Taishan Xu, Xiancheng Ren, Sanglu Lu. 65-80 [doi]
- Asymptotic Performance and Energy Consumption of SLACKAnne Benoit, Louis-Claude Canon, Redouane Elghazi, Pierre-Cyrille Héam. 81-95 [doi]
- A Poisson-Based Approximation Algorithm for Stochastic Bin Packing of Bernoulli ItemsTomasz Kanas, Krzysztof Rzadca. 96-110 [doi]
- Hierarchical Management of Extreme-Scale Task-Based ApplicationsFrancesc Lordan, Gabriel Puigdemunt, Pere Vergés, Javier Conejero, Jorge Ejarque, Rosa M. Badia. 111-124 [doi]
- MESDD: A Distributed Geofence-Based Discovery Method for the Computing ContinuumKurt Horvath, Dragi Kimovski, Christoph Uran, Helmut Wöllik, Radu Prodan. 125-138 [doi]
- Parameterized Analysis of a Dynamic Programming Algorithm for a Parallel Machine Scheduling ProblemIstenc Tarhan, Jacques Carlier, Claire Hanen, Antoine Jouglet, Alix Munier Kordon. 139-153 [doi]
- SparkEdgeEmu: An Emulation Framework for Edge-Enabled Apache Spark DeploymentsMoysis Symeonides, Demetris Trihinas, George Pallis 0001, Marios D. Dikaiakos. 154-168 [doi]
- ODIN: Overcoming Dynamic Interference in iNference PipelinesPirah Noor Soomro, Nikela Papadopoulou, Miquel Pericàs. 169-183 [doi]
- DAG-Based Efficient Parallel Scheduler for Blockchains: Hyperledger Sawtooth as a Case StudyManaswini Piduguralla, Saheli Chakraborty, Parwat Singh Anjana, Sathya Peri. 184-198 [doi]
- INSTANT: A Runtime Framework to Orchestrate In-Situ WorkflowsFeng Li 0025, Fengguang Song. 199-213 [doi]
- How Do OS and Application Schedulers Interact? An Investigation with Multithreaded ApplicationsJonas H. Müller Korndörfer, Ahmed Eleliemy, Osman Seckin Simsek, Thomas Ilsche, Robert Schöne, Florina M. Ciorba. 214-228 [doi]
- Assessing Power Needs to Run a Workload with Quality of Service on Green DatacentersLouis-Claude Canon, Damien Landré, Laurent Philippe 0001, Jean-Marc Pierson, Paul Renaud-Goud. 229-242 [doi]
- Improving Utilization of Dataflow Architectures Through Software and Hardware Co-DesignZhihua Fan, Wenming Li, Shengzhong Tang, Xuejun An, Xiaochun Ye, Dongrui Fan. 245-259 [doi]
- A Multi-level Parallel Integer/Floating-Point Arithmetic Architecture for Deep Learning InstructionsHongbing Tan, Jing Zhang, Libo Huang, Xiaowei He, Dezun Dong, Yongwen Wang, Liquan Xiao. 260-274 [doi]
- Lock-Free Bucketized Cuckoo HashingWenhai Li, Zhiling Cheng, Yuan Chen, Ao Li, Lingfeng Deng. 275-288 [doi]
- BitHist: A Precision-Scalable Sparse-Awareness DNN Accelerator Based on Bit Slices Products HistogramZhaoteng Meng, Long Xiao, Xiaoyao Gao, Zhan Li, Lin Shu, Jie Hao. 289-303 [doi]
- Computational Storage for an Energy-Efficient Deep Neural Network Training SystemShiju Li, Kevin Tang, Jin Lim, Chul-Ho Lee, Jongryool Kim. 304-319 [doi]
- Optimizing Data Movement for GPU-Based In-Situ Workflow Using GPUDirect RDMABo Zhang, Philip E. Davis, Nicolas Morales, Zhao Zhang, Keita Teranishi, Manish Parashar. 323-338 [doi]
- FedGM: Heterogeneous Federated Learning via Generative Learning and Mutual DistillationChao Peng 0004, Yiming Guo, Yao Chen, Qilin Rui, Zhengfeng Yang, Chenyang Xu. 339-351 [doi]
- DeTAR: A Decision Tree-Based Adaptive Routing in Networks-on-ChipXiaoyun Zhang, Yaohua Wang, Dezun Dong, Cunlu Li, Shaocong Wang, Liquan Xiao. 352-366 [doi]
- Auto-Divide GNN: Accelerating GNN Training with Subgraph DivisionHongyu Chen, Zhejiang Ran, Keshi Ge, Zhiquan Lai, Jingfei Jiang, Dongsheng Li 0001. 367-382 [doi]
- Model-Agnostic Federated LearningGianluca Mittone, Walter Riviera, Iacopo Colonnelli, Robert Birke, Marco Aldinucci. 383-396 [doi]
- Scalable Random Forest with Data-Parallel ComputingFernando Vázquez-Novoa, Javier Conejero, Cristian Tatu, Rosa M. Badia. 397-410 [doi]
- SymED: Adaptive and Online Symbolic Representation of Data on the EdgeDaniel Hofstätter, Shashikant Ilager, Ivan Lujic, Ivona Brandic. 411-425 [doi]
- MMExit: Enabling Fast and Efficient Multi-modal DNN Inference with Adaptive Network ExitsXiaofeng Hou, Jiacheng Liu 0001, Xuehan Tang, Chao Li 0009, Kwang-Ting Cheng, Li Li, Minyi Guo. 426-440 [doi]
- Distributed Deep Multilevel Graph PartitioningPeter Sanders 0001, Daniel Seemaier. 443-457 [doi]
- TrainBF: High-Performance DNN Training Engine Using BFloat16 on AI AcceleratorsZhen Xie, Siddhisanket Raskar, Murali Emani, Venkatram Vishwanath. 458-473 [doi]
- Distributed k-Means with Outliers in General MetricsEnrico Dandolo, Andrea Pietracaprina, Geppino Pucci. 474-488 [doi]
- A Parallel Scan Algorithm in the Tensor Core Unit ModelAnastasios Zouzias, William F. McColl. 489-502 [doi]
- Improved Algorithms for Monotone Moldable Job Scheduling Using Compression and ConvolutionKilian Grage, Klaus Jansen, Felix Ohnesorge. 503-517 [doi]
- On Size Hiding Protocols in Beeping ModelDominik Bojko, Marek Klonowski, Mateusz Marciniak, Piotr Syga. 518-532 [doi]
- Efficient Protective Jamming in 2D SINR NetworksDominik Bojko, Marek Klonowski, Dariusz R. Kowalski, Mateusz Marciniak. 533-546 [doi]
- GPU Code Generation of Cardiac Electrophysiology Simulation with MLIRTiago Trevisan Jost, Arun Thangamani, Raphaël Colin, Vincent Loechner, Stéphane Genaud, Bérenger Bramas. 549-563 [doi]
- SWSPH: A Massively Parallel SPH Implementation for Hundred-Billion-Particle Simulation on New Sunway SupercomputerZiyu Zhang, Junshi Chen, Zhanming Wang, Yifan Luo, Jineng Yao, Shenghong Huang, Hong An. 564-577 [doi]
- Transactional-Turn Causal ConsistencyBenoît Martin, Laurent Prosperi, Marc Shapiro 0001. 578-591 [doi]
- Im2win: An Efficient Convolution Paradigm on GPUShuai Lu, Jun Chu, Luanzheng Guo, Xu T. Liu. 592-607 [doi]
- Accelerating Drug Discovery in AutoDock-GPU with Tensor CoresGabin Schieffer, Ivy Bo Peng. 608-622 [doi]
- FedCML: Federated Clustering Mutual Learning with non-IID DataZekai Chen, Fuyi Wang, Shengxing Yu, Ximeng Liu, Zhiwei Zheng. 623-636 [doi]
- A Look at Performance and Scalability of the GPU Accelerated Sparse Linear System Solver SplissJasmin Mohnke, Michael Wagner 0003. 637-648 [doi]
- Parareal with a Physics-Informed Neural Network as Coarse PropagatorAbdul Qadir Ibrahim, Sebastian Götschel, Daniel Ruprecht. 649-663 [doi]
- Faster Segmented Sort on GPUsRobin Kobus, Johannes Nelgen, Valentin Henkys, Bertil Schmidt. 664-678 [doi]
- Hercules: Scalable and Network Portable In-Memory Ad-Hoc File System for Data-Centric and High-Performance ApplicationsJavier García Blas, Genaro Sanchez-Gallegos, Cosmin Petre, Alberto Riccardo Martinelli, Marco Aldinucci, Jesús Carretero 0001. 679-693 [doi]
- An Efficient Parallel Adaptive GMG Solver for Large-Scale Stokes ProblemsSeyed Saberi, Günther Meschke, Andreas Vogel 0001. 694-709 [doi]
- Optimizing Distributed Tensor Contractions Using Node-Aware Processor GridsAndreas Irmler, Raghavendra Kanakagiri, Sebastian T. Ohlmann, Edgar Solomonik, Andreas Grüneis. 710-724 [doi]
- Parallel Cholesky Factorization for Banded Matrices Using OpenMP TasksFelix Liu, Albin Fredriksson, Stefano Markidis. 725-739 [doi]