Abstract is missing.
- GraphTensor: Comprehensive GNN-Acceleration Framework for Efficient Parallel Processing of Massive DatasetsJunhyeok Jang, Miryeong Kwon, Donghyun Gouk, Hanyeoreum Bae, Myoungsoo Jung. 2-12 [doi]
- GraphMetaP: Efficient MetaPath Generation for Dynamic Heterogeneous Graph ModelsHaiheng He, Dan Chen, Long Zheng 0003, Yu Huang 0013, Haifeng Liu, Chaoqiang Liu, Xiaofei Liao, Hai Jin 0001. 13-24 [doi]
- Traversing Large Compressed Graphs on GPUsPrasun Gera, Hyesoon Kim. 25-35 [doi]
- Distributed Sparse Random Projection Trees for Constructing K-Nearest Neighbor GraphsIsuru Ranawaka, Md. Khaledur Rahman, Ariful Azad. 36-46 [doi]
- Fast Deterministic Gathering with Detection on Arbitrary Graphs: The Power of Many RobotsAnisur Rahaman Molla, Kaushik Mondal 0001, William K. Moses Jr.. 47-57 [doi]
- Accurate and Efficient Distributed COVID-19 Spread Prediction based on a Large-Scale Time-Varying People Mobility GraphSudipta Saha Shubha, Shohaib Mahmud, Haiying Shen, Geoffrey C. Fox, Madhav V. Marathe. 58-68 [doi]
- H-Cache: Traffic-Aware Hybrid Rule-Caching in Software-Defined NetworksZeyu Luan, Qing Li, Yi Wang, Yong Jiang 0001. 69-78 [doi]
- Accelerating Packet Processing in Container Overlay Networks via Packet-level ParallelismJiaxin Lei, Manish Munikar, Hui Lu 0001, Jia Rao. 79-89 [doi]
- Software-Defined, Fast and Strongly-Consistent Data Replication for RDMA-Based PM DatastoresHaodi Lu, Haikun Liu, Chencheng Ye, Xiaofei Liao, Fubing Mao, Yu Zhang 0027, Hai Jin 0001. 90-101 [doi]
- Signal Detection for Large MIMO Systems Using Sphere Decoding on FPGAsMohamed W. Hassan, Adel Dabah, Hatem Ltaief, Suhaib A. Fahmy. 102-111 [doi]
- Efficient Hardware Primitives for Immediate Memory Reclamation in Optimistic Data StructuresAjay Singh, Trevor Brown 0001, Michael Spear. 112-122 [doi]
- A Novel Framework for Efficient Offloading of Communication Operations to Bluefield SmartNICsKaushik Kandadi Suresh, Benjamin Michalowicz, Bharath Ramesh 0005, Nicholas Contini, Jinghan Yao, Shulei Xu, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda 0001. 123-133 [doi]
- Accelerating Distributed Deep Learning Training with Compression Assisted Allgather and Reduce-Scatter CommunicationQinghua Zhou, Quentin Anthony, Lang Xu, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda 0001. 134-144 [doi]
- Accelerating CNN inference on long vector architectures via co-designSonia Rani Gupta, Nikela Papadopoulou, Miquel Pericàs. 145-155 [doi]
- Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPUJianjin Liao, Mingzhen Li, Hailong Yang, Qingxiao Sun, Biao Sun, Jiwei Hao, Tianyu Feng, Fengwei Yu, Shengdong Chen, Ye Tao, Zicheng Zhang, Zhongzhi Luan, Depei Qian. 156-166 [doi]
- MPipeMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline ParallelismZheng Zhang, Donglin Yang, Yaqi Xia, Liang Ding 0006, Dacheng Tao, Xiaobo Zhou, Dazhao Cheng. 167-177 [doi]
- Mimir: Extending I/O Interfaces to Express User Intent for Complex Workloads in HPCHariharan Devarajan, Kathryn M. Mohror. 178-188 [doi]
- Drill: Log-based Anomaly Detection for Large-scale Storage Systems Using Source Code AnalysisDi Zhang, Chris Egersdoerfer, Tabassum Mahmud, Mai Zheng, Dong Dai 0001. 189-199 [doi]
- FaultyRank: A Graph-based Parallel File System CheckerSaisha Kamat, Abdullah Al Raqibul Islam, Mai Zheng, Dong Dai 0001. 200-210 [doi]
- Evaluating Asynchronous Parallel I/O on HPC SystemsJohn Ravi, Suren Byna, Quincey Koziol, Houjun Tang, Michela Becchi. 211-221 [doi]
- An Efficient 2D Method for Training Super-Large Deep Learning ModelsQifan Xu, Yang You 0001. 222-232 [doi]
- Dynasparse: Accelerating GNN Inference through Dynamic Sparsity ExploitationBingyi Zhang, Viktor K. Prasanna. 233-244 [doi]
- Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model TrainingSiddharth Singh, Abhinav Bhatele. 245-255 [doi]
- Asynch-SGBDT: Train Stochastic Gradient Boosting Decision Trees in an Asynchronous Parallel MannerDaning Cheng, Shigang Li 0002, Yunquan Zhang. 256-267 [doi]
- SRC: Mitigate I/O Throughput Degradation in Network Congestion Control of Disaggregated Storage SystemsDanlin Jia, Yiming Xie, Li Wang, Xiaoqian Zhang, Allen Yang, Xuebin Yao, Mahsa Bayati, Pradeep Subedi, Bo Sheng, Ningfang Mi. 268-278 [doi]
- Boosting Multi-Block Repair in Cloud Storage Systems with Wide-Stripe Erasure CodingQi Yu, Lin Wang, Yuchong Hu, Yumeng Xu, Dan Feng 0001, Jie Fu, Xia Zhu, Zhen Yao, Wenjia Wei. 279-289 [doi]
- UnifyFS: A User-level Shared File System for Unified Access to Distributed Local StorageMichael J. Brim, Adam T. Moody, Seung-Hwan Lim, Ross G. Miller, Swen Boehm, Cameron Stanavige, Kathryn M. Mohror, Sarp Oral. 290-300 [doi]
- ArkFS: A Distributed File System on Object Storage for Archiving Data in HPC EnvironmentKyu-Jin Cho, Injae Kang, Jin-Soo Kim. 301-311 [doi]
- On Doorway Egress by Autonomous RobotsRory Hector, Ramachandran Vaidyanathan, Gokarna Sharma, Jerry L. Trahan. 312-321 [doi]
- PAQR: Pivoting Avoiding QR factorizationWissam M. Sid-Lakhdar, Sébastien Cayrols, Daniel Bielich, Ahmad Abdelfattah, Piotr Luszczek, Mark Gates, Stanimire Tomov, Hans Johansen, David B. Williams-Young, Timothy A. Davis 0001, Jack J. Dongarra, Hartwig Anzt. 322-332 [doi]
- DeepThermo: Deep Learning Accelerated Parallel Monte Carlo Sampling for Thermodynamics Evaluation of High Entropy AlloysJunqi Yin, Feiyi Wang, Mallikarjun Arjun Shankar. 333-343 [doi]
- ByteTransformer: A High-Performance Transformer Boosted for Variable-Length InputsYujia Zhai, Chengquan Jiang, Leyuan Wang, Xiaoying Jia 0001, Shang Zhang, Zizhong Chen, Xin Liu, Yibo Zhu. 344-355 [doi]
- On the Arithmetic Intensity of Distributed-Memory Dense Matrix Multiplication Involving a Symmetric Input Matrix (SYMM)Emmanuel Agullo, Alfredo Buttari, Olivier Coulaud, Lionel Eyraud-Dubois, Mathieu Faverge, Alain Franc, Abdou Guermouche, Antoine Jego, Romain Peressoni, Florent Pruvost. 357-367 [doi]
- A Novel Triangular Space-Filling Curve for Cache-Oblivious In-Place Transposition of Square MatricesJoão Nuno Ferreira Alves, Luís Manuel Silveira Russo, Alexandre P. Francisco, Siegfried Benkner. 368-378 [doi]
- Memory-aware Optimization for Sequences of Sparse Matrix-Vector MultiplicationsYichen Zhang, Shengguo Li, Fan Yuan, Dezun Dong, Xiaojian Yang, Tiejun Li, Zheng Wang 0001. 379-389 [doi]
- Data Distribution Schemes for Dense Linear Algebra Factorizations on Any Number of NodesOlivier Beaumont, Jean-Alexandre Collin, Lionel Eyraud-Dubois, Mathieu Vérité. 390-401 [doi]
- Dynamic Tensor Linearization and Time Slicing for Efficient Factorization of Infinite Data StreamsYongseok Soh, Ahmed E. Helal, Fabio Checconi, Jan Laukemann, Jesmin Jahan Tithi, Teresa M. Ranadive, Fabrizio Petrini, Jee W. Choi. 402-412 [doi]
- Scheduling with Many Shared ResourcesMax A. Deppert, Klaus Jansen, Marten Maack, Simon Pukrop, Malin Rau. 413-423 [doi]
- Chic-sched: a HPC Placement-Group Scheduler on Hierarchical Topologies with ConstraintsLaurent Schares, Asser N. Tantawi, Pavlos Maniotis, Ming-Hung Chen, Claudia Misale, Seetharami Seelam, Hao Yu. 424-434 [doi]
- Generalizable Reinforcement Learning-Based Coarsening Model for Resource Allocation over Large and Diverse Stream Processing GraphsLanshun Nie, Yuqi Qiu, Fei Meng, Mo Yu, Jing Li. 435-445 [doi]
- RLP: Power Management Based on a Latency-Aware Roofline ModelBo Wang, Anara Kozhokanova, Christian Terboven, Matthias S. Müller. 446-456 [doi]
- SLAP: An Adaptive, Learned Admission Policy for Content Delivery Network CachingKe Liu, Kan Wu, Hua Wang, Ke Zhou 0001, Ji Zhang, Cong Li. 457-467 [doi]
- Proactive SLA-aware Application Placement in the Computing ContinuumZahra Najafabadi Samani, Narges Mehran, Dragi Kimovski, Radu Prodan. 468-479 [doi]
- PFedSA: Personalized Federated Multi-Task Learning via Similarity AwarenessChuyao Ye, Hao Zheng, Zhigang Hu, Meiguang Zheng. 480-488 [doi]
- FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive DropoutJingjing Xue, Min Liu 0001, Sheng Sun, Yuwei Wang, Hui Jiang, Xuefeng Jiang. 489-500 [doi]
- Fast Sparse GPU Kernels for Accelerated Training of Graph Neural NetworksRuibo Fan, Wei Wang 0030, Xiaowen Chu. 501-511 [doi]
- Communication Optimization for Distributed Execution of Graph Neural NetworksSüreyya Emre Kurt, Jinghua Yan, Aravind Sukumaran-Rajam, Prashant Pandey, P. Sadayappan. 512-523 [doi]
- A Machine Learning Approach Towards Runtime Optimisation of Matrix MultiplicationYufan Xia, Marco De La Pierre, Amanda S. Barnard, Giuseppe M. J. Barca. 524-534 [doi]
- Power Constrained Autotuning using Graph Neural NetworksAkash Dutta, Jee Choi, Ali Jannesari. 535-545 [doi]
- SCONNA: A Stochastic Computing Based Optical Accelerator for Ultra-Fast, Energy-Efficient Inference of Integer-Quantized CNNsSairam Sri Vatsavai, Venkata Sai Praneeth Karempudi, Ishan G. Thakkar, Sayed Ahmad Salehi, Jeffrey Todd Hastings. 546-556 [doi]
- HyScale-GNN: A Scalable Hybrid GNN Training System on Single-Node Heterogeneous ArchitectureYi-Chien Lin, Viktor K. Prasanna. 557-567 [doi]
- Optimizing Cloud Computing Resource Usage for Hemodynamic SimulationWilliam Ladd, Christopher Jensen, Madhurima Vardhan, Jeff Ames, Jeff R. Hammond, Erik W. Draeger, Amanda Randles. 568-578 [doi]
- Predictive Analysis of Code Optimisations on Large-Scale Coupled CFD-Combustion Simulations using the CPX Mini-AppA. Powell, Gihan R. Mudalige. 579-589 [doi]
- Scalable adaptive algorithms for next-generation multiphase flow simulationsKumar Saurabh, Masado Ishii, Makrand A. Khanwale, Hari Sundar, Baskar Ganapathysubramanian. 590-601 [doi]
- Porting a Computational Fluid Dynamics Code with AMR to Large-scale GPU PlatformsJoshua H. Davis, Justin Shafner, Daniel Nichols, Nathan Grube, Pino Martin, Abhinav Bhatele. 602-612 [doi]
- Neural Network Compiler for Parallel High-Throughput Simulation of Digital CircuitsIgnacio Gavier, Joshua Russell, Devdhar Patel, Edward A. Rietman, Hava T. Siegelmann. 613-623 [doi]
- Opportunities and Limitations of Hardware Timestamps in Concurrent Data StructuresOlivia Grimes, Jacob Nelson-Slivon, Ahmed Hassan, Roberto Palmieri. 624-634 [doi]
- Harnessing the Crowd for Autotuning High-Performance Computing ApplicationsYounghyun Cho, James Weldon Demmel, Jacob King, Xiaoye S. Li, Yang Liu 0179, Hengrui Luo. 635-645 [doi]
- *Kawthar Shafie Khorassani, Chen-Chun Chen, Hari Subramoni, Dhabaleswar K. Panda 0001. 646-656 [doi]
- SW-LCM: A Scalable and Weakly-supervised Land Cover Mapping Method on a New Sunway SupercomputerYi Zhao, Juepeng Zheng, Haohuan Fu, Wenzhao Wu, Jie Gao, Mengxuan Chen, Jinxiao Zhang, Lixian Zhang, Runmin Dong, Zhenrong Du, Sha Liu, Xin Liu, Shaoqing Zhang, Le Yu 0001. 657-667 [doi]
- Feature-based SpMV Performance Analysis on Contemporary DevicesPanagiotis Mpakos, Dimitrios Galanopoulos, Petros Anastasiadis, Nikela Papadopoulou, Nectarios Koziris, Georgios I. Goumas. 668-679 [doi]
- An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUsIchitaro Yamazaki, Alexander Heinlein, Sivasankaran Rajamanickam. 680-689 [doi]
- Engineering Massively Parallel MST AlgorithmsPeter Sanders 0001, Matthias Schimek. 691-701 [doi]
- Engineering a Distributed-Memory Triangle Counting AlgorithmPeter Sanders 0001, Tim Niklas Uhl. 702-712 [doi]
- PRF: A Fast Parallel Relaxed Flooding Algorithm for Voronoi Diagram Generation on GPUJue Wang, Fumihiko Ino, Jing Ke. 713-723 [doi]
- Satellite Collision Detection using Spatial Data StructuresChristian Hellwig, Fabian Czappa, Martin Michel, Reinhold Bertrand, Felix Wolf 0001. 724-735 [doi]
- AnyQ: An Evaluation Framework for Massively-Parallel Queue AlgorithmsMichael Kenzel, Stefan Lemme, Richard Membarth, Matthias Kurtenacker, Hugo Devillers, Markus Steinberger, Philipp Slusallek. 736-745 [doi]
- qTask: Task-parallel Quantum Circuit Simulation with IncrementalityTsung-Wei Huang. 746-756 [doi]
- GPU-Accelerated Error-Bounded Compression Framework for Quantum Circuit SimulationsMilan Shah, Xiaodong Yu, Sheng Di, Danylo Lykov, Yuri Alexeev, Michela Becchi, Franck Cappello. 757-767 [doi]
- An Adaptive Hybrid Quantum Algorithm for the Metric Traveling Salesman ProblemFei Li, Arul Rhik Mazumder. 768-778 [doi]
- Stochastic Neuromorphic Circuits for Solving MAXCUTBradley H. Theilman, Yipu Wang, Ojas Parekh, William Severa, J. Darby Smith, James B. Aimone. 779-787 [doi]
- TurboHE: Accelerating Fully Homomorphic Encryption Using FPGA ClustersHaohao Liao, Mahmoud A. Elmohr, Xuan Dong, Yanjun Qian, Wenzhe Yang, Zhiwei Shang, Yin Tan. 788-797 [doi]
- Towards Faster Fully Homomorphic Encryption Implementation with Integer and Floating-point Computing Power of GPUsGuang Fan, Fangyu Zheng, Lipeng Wan, Lili Gao, Yuan Zhao, Jiankuo Dong, Yixuan Song, Yuewu Wang, Jingqiang Lin. 798-808 [doi]
- FedTrip: A Resource-Efficient Federated Learning Method with Triplet RegularizationXujing Li, Min Liu 0001, Sheng Sun, Yuwei Wang, Hui Jiang, Xuefeng Jiang. 809-819 [doi]
- A Guaranteed Approximation Algorithm for Scheduling Fork-Joins with Communication DelayPierre-François Dutot, Yeu-Shin Fu, Nikhil Prasad, Oliver Sinnen. 820-830 [doi]
- SelB-k-NN: A Mini-Batch K-Nearest Neighbors Algorithm on AI ProcessorsYifeng Tang, Cho-Li Wang. 831-841 [doi]
- Exact Fault-Tolerant Consensus with Voting ValidityZhangchen Xu, Yuetai Li, Chenglin Feng, Lei Zhang 0035. 842-852 [doi]
- k-Center Clustering with Outliers in the MPC and Streaming ModelMark de Berg, Leyla Biabani, Morteza Monemizadeh. 853-863 [doi]
- FIRST: Exploiting the Multi-Dimensional Attributes of Functions for Power-Aware Serverless ComputingLu Zhang 0049, Chao Li 0009, Xinkai Wang, Weiqi Feng, Zheng Yu, Quan Chen 0002, Jingwen Leng, Minyi Guo, Pu Yang, Shang Yue. 864-874 [doi]
- Duo: Improving Data Sharing of Stateful Serverless Applications by Efficiently Caching Multi-Read DataZhuo Huang, Hao Fan, Chaoyi Cheng, Song Wu 0001, Hai Jin 0001. 875-885 [doi]
- QoS-Aware and Cost-Efficient Dynamic Resource Allocation for Serverless ML WorkflowsHao Wu 0010, Junxiao Deng, Hao Fan, Shadi Ibrahim, Song Wu 0001, Hai Jin 0001. 886-896 [doi]
- rFaaS: Enabling High Performance Serverless with RDMA and LeasesMarcin Copik, Konstantin Taranov, Alexandru Calotoiu, Torsten Hoefler. 897-907 [doi]
- Alioth: A Machine Learning Based Interference-Aware Performance Monitor for Multi-Tenancy Applications in Public CloudTianyao Shi, Yingxuan Yang, Yunlong Cheng, Xiaofeng Gao 0001, Zhen Fang, Yongqiang Yang. 908-917 [doi]
- GPU-enabled Function-as-a-Service for Machine Learning InferenceMing Zhao, Kritshekhar Jha, Sungho Hong. 918-928 [doi]
- Lyra: Fast and Scalable Resilience to Reordering Attacks in BlockchainsPouriya Zarbafian, Vincent Gramoli. 929-939 [doi]
- Smart Redbelly Blockchain: Reducing Congestion for Web3Deepal Tennakoon, Yiding Hua, Vincent Gramoli. 940-950 [doi]
- SBGT: Scaling Bayesian-based Group Testing for Disease SurveillanceWeicong Chen, Hao Qi, Xiaoyi Lu, Curtis Tatsuoka. 951-962 [doi]
- RT-DBSCAN: Accelerating DBSCAN using Ray Tracing HardwareVani Nagarajan, Milind Kulkarni 0001. 963-973 [doi]
- Distributing Simplex-Shaped Nested for-Loops to Identify Carcinogenic Gene CombinationsSajal Dash, Mohammad Alaul Haque Monil, Junqi Yin, Ramu Anandakrishnan, Feiyi Wang. 974-984 [doi]
- LowFive: In Situ Data Transport for High-Performance WorkflowsTom Peterka, Dmitriy Morozov, Arnur Nigmetov, Orcun Yildiz, Bogdan Nicolae, Philip E. Davis. 985-995 [doi]
- MCR-DL: Mix-and-Match Communication Runtime for Deep LearningQuentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda 0001. 996-1006 [doi]
- Lossy Scientific Data Compression With SPERRShaomeng Li, Peter Lindstrom, John P. Clyne. 1007-1017 [doi]
- Fast And Automatic Floating Point Error Analysis With CHEF-FPGarima Singh, Baidyanath Kundu, Harshitha Menon, Alexander Penev, David J. Lange, Vassil Vassilev. 1018-1028 [doi]
- DAOS as HPC Storage: a View From Numerical Weather PredictionNicolau Manubens, Tiago Quintino, Simon D. Smart, Emanuele Danovaro, Adrian Jackson. 1029-1040 [doi]
- ZFP-X: Efficient Embedded Coding for Accelerating Lossy Floating Point CompressionBing Lu, Yida Li, Junqi Wang, Huizhang Luo, Kenli Li 0001. 1041-1050 [doi]