Abstract is missing.
- Accelerating Distributed ML Training via Selective SynchronizationSahil Tyagi, Martin Swany. 1-12 [doi]
- PredictDDL: Reusable Workload Performance Prediction for Distributed Deep LearningKevin Assogba, Eduardo Lima, M. Mustafa Rafique, Minseok Kwon. 13-24 [doi]
- Exact Distributed Stochastic Block PartitioningFrank Wanye, Vitaliy Gleyzer, Edward K. Kao, Wu-chun Feng. 25-36 [doi]
- DEHype: Retrofitting Hypervisors for a Resource-Disaggregated EnvironmentTaehoon Kim, Kwangwon Koh, Changdae Kim, Eunji Pak, Yeonjeong Jeong, Sang-Hoon Kim. 37-48 [doi]
- SciLance: Mitigate Load Imbalance for Parallel Scientific Applications in Cloud EnvironmentsXinying Wang, Lipeng Wan, Scott Klasky, Dongfang Zhao 0001, Feng Yan 0001. 49-59 [doi]
- Generalized Collective Algorithms for the Exascale EraMichael Wilkins, Hanming Wang, Peizhi Liu, Bangyen Pham, Yanfei Guo, Rajeev Thakur, Peter A. Dinda, Nikos Hardavellas. 60-71 [doi]
- FedGuard: Selective Parameter Aggregation for Poisoning Attack Mitigation in Federated LearningMelvin Chelli, Cèdric Prigent, René Schubotz, Alexandru Costan, Gabriel Antoniu, Loïc Cudennec, Philipp Slusallek. 72-81 [doi]
- Prophet: Fine-grained Load Balancing for Parallel Training of Large-scale MoE ModelsWei Wang, Zhiquan Lai, Shengwei Li, Weijie Liu, Keshi Ge, Yujie Liu, Ao Shen, Dongsheng Li 0001. 82-94 [doi]
- HIOS: Hierarchical Inter-Operator Scheduler for Real-Time Inference of DAG-Structured Deep Learning Models on Multiple GPUsTurja Kundu, Tong Shu. 95-106 [doi]
- FullRepair: Towards Optimal Repair Pipelining in Erasure-Coded Clustered Storage SystemsYuzuo Zhang, Xinyuan Tu, Lin Wang, Yuchong Hu, Fang Wang, Ye Wang. 107-117 [doi]
- Performance Characterization of NVMe Flash Devices with Zoned Namespaces (ZNS)Krijn Doekemeijer, Nick Tehrany, Balakrishnan Chandrasekaran 0002, Matias Bjørling, Animesh Trivedi. 118-131 [doi]
- KV-CSD: A Hardware-Accelerated Key-Value Store for Data-Intensive ApplicationsInhyuk Park, Qing Zheng, Dominic Manno, Soonyeal Yang, Jason Lee, David Bonnie, Bradley W. Settlemyer, Youngjae Kim, Woosuk Chung, Gary Grider. 132-144 [doi]
- Rethinking Virtual Machines Live Migration for Memory DisaggregationXingguo Jia, Xingzi Yu, Yun Wang, Senhao Yu, Zhengwei Qi. 145-157 [doi]
- Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM PhotonicsGeorge Michelogiannakis, Yehia Arafa, Brandon Cook 0001, Liang Yuan Dai, Abdel-Hameed A. Badawy, Madeleine Glick, Yuyang Wang, Keren Bergman, John Shalf. 158-172 [doi]
- ExplSched: Maximizing Deep Learning Cluster Efficiency for Exploratory JobsHongliang Li, Hairui Zhao, Zhewen Xu, Xiang Li, Haixiao Xu. 173-184 [doi]
- Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning ApproachUrvij Saroliya, Eishi Arima, Dai Liu, Martin Schulz 0001. 185-196 [doi]
- Communication-Avoiding Recursive AggregationYihao Sun, Sidharth Kumar, Thomas Gilray, Kristopher K. Micinski. 197-208 [doi]
- HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore ProcessorsWenxuan Li, Helin Cheng, Zhengyang Lu, Yuechen Lu, Weifeng Liu 0002. 209-220 [doi]
- ProvLight: Efficient Workflow Provenance Capture on the Edge-to-Cloud ContinuumDaniel Rosendo, Marta Mattoso, Alexandru Costan, Renan Souza 0001, Débora B. Pina, Patrick Valduriez, Gabriel Antoniu. 221-233 [doi]
- Optimizing HPC I/O Performance with Regression Analysis and Ensemble LearningZhangyu Liu, Cheng Zhang, Huijun Wu, Jianbin Fang, Lin Peng, Guixin Ye, Zhanyong Tang. 234-246 [doi]
- A Lightweight, Effective Compressibility Estimation Method for Error-bounded Lossy CompressionArkaprabha Ganguli, Robert Underwood, Julie Bessac, David Krasowska, Jon C. Calhoun, Sheng Di, Franck Cappello. 247-258 [doi]
- A Dynamic Network-Native MPI Partitioned Aggregation Over InfiniBand VerbsYiltan Hassan Temuçin, Scott Levy, Whit Schonbein, Ryan E. Grant, Ahmad Afsahi. 259-270 [doi]
- DoW-KV: A DPU-offloaded and Write-optimized Key-Value Store on Disaggregated Persistent MemoryYiwen Zhang, Guokuan Li, Jiguang Wan, Junyue Wang, Jun Li, Ting Yao, Huatao Wu, Daohui Wang. 271-283 [doi]
- Uniform Algorithms for Reduce-scatter and (most) other Collectives for MPIJesper Larsson Träff, Sascha Hunold, Ioannis Vardas, Nikolaus Manes Funk. 284-294 [doi]
- JACO: JAva Code Layout Optimizer Enabling Continuous Optimization without Pausing Application ServicesWenhai Lin, Jingchang Qin, Yiquan Chen, Zhen Jin 0008, Jiexiong Xu, Yuzhong Zhang, Shishun Cai, Lirong Fu, Yi Chen, Wenzhi Chen. 295-306 [doi]
- A Finite-Difference Time-Domain (FDTD) solver with linearly scalable performance in an FPGA clusterZhenyu Xu, Miaoxiang Yu, Jillian Cai, Qing Yang 0001, Tao Wei. 307-317 [doi]
- GPU Occupancy Prediction of Deep Learning Models Using Graph Neural NetworkHengquan Mei, Huaizhi Qu, Jingwei Sun 0001, Yanjie Gao, Haoxiang Lin, Guangzhong Sun. 318-329 [doi]
- Reducing Data Motion and Energy Consumption of Geospatial Modeling Applications Using Automated Precision ConversionQinglei Cao, Sameh Abdulah, Hatem Ltaief, Marc G. Genton, David E. Keyes, George Bosilca. 330-342 [doi]
- SDT: A Low-cost and Topology-reconfigurable Testbed for Network ResearchZixuan Chen, Zhigao Zhao, Zijian Li 0018, Jiang Shao, Sen Liu 0002, Yang Xu 0010. 343-353 [doi]
- PiP-MColl: Process-in-Process-based Multi-object MPI CollectivesJiajun Huang, Kaiming Ouyang, Yujia Zhai, Jinyang Liu, Min-Si, Ken Raffenetti, Hui Zhou, Atsushi Hori, Zizhong Chen, Yanfei Guo, Rajeev Thakur. 354-364 [doi]
- TopoCommit: A Topological Commit Protocol for Cross-Ledger Transactions in Scientific ComputingOlamide Timothy Tawose, Lei Yang 0001, Dongfang Zhao 0001. 365-375 [doi]