108 | -- | 119 | Hariharan Devarajan, Gerd Heber, Kathryn M. Mohror. H5Intent: Autotuning HDF5 With User Intent |
120 | -- | 132 | Diletta Olliaro, Adityo Anggraito, Marco Ajmone Marsan, Simonetta Balsamo, Andrea Marin. The Impact of Service Demand Variability on Data Center Performance |
133 | -- | 149 | Shuai Lin, Rui Wang 0076, Yongkun Li 0001, Yinlong Xu, John C. S. Lui. Two-Dimensional Balanced Partitioning and Efficient Caching for Distributed Graph Analysis |
150 | -- | 167 | Zhi Ling, Xiaofeng Jiang, Xiaobin Tan, Huasen He, Shiyin Zhu, Jian Yang 0014. Joint Dynamic Data and Model Parallelism for Distributed Training of DNNs Over Heterogeneous Infrastructure |
168 | -- | 184 | Diandian Gu, Yihao Zhao, Peng Sun 0006, Xin Jin 0008, Xuanzhe Liu. GreenFlow: A Carbon-Efficient Scheduler for Deep Learning Workloads |
185 | -- | 196 | Pengwei Wang 0001, Junye Qiao, Yuying Zhao, Zhijun Ding. Cost-Effective and Low-Latency Data Placement in Edge Environment Based on PageRank-Inspired Regional Value |
197 | -- | 211 | Xiaodong Dong, Lihai Nie, Zheli Liu, Yang Xiang 0001. Slark: A Performance Robust Decentralized Inter-Datacenter Deadline-Aware Coflows Scheduling Framework With Local Information |
212 | -- | 225 | Jialiang Han, Yudong Han, Xiang Jing, Gang Huang 0001, Yun Ma 0002. DegaFL: Decentralized Gradient Aggregation for Cross-Silo Federated Learning |
226 | -- | 238 | Zhongyi Lin, Ning Sun, Pallab Bhattacharya, Xizhou Feng, Louis Feng, John D. Owens. Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms |
239 | -- | 252 | Zhangrong Qin, Xusheng Lu, Long Lv, Zhongxiang Tang, Binghai Wen. An Efficient GPU Algorithm for Lattice Boltzmann Method on Sparse Complex Geometries |
253 | -- | 265 | J. Gregory Pauloski, Valérie Hayot-Sasson, Logan T. Ward, Alexander Brace, André Bauer 0001, Kyle Chard, Ian T. Foster. Object Proxy Patterns for Accelerating Distributed Applications |
266 | -- | 281 | Changyao Lin, Zhenming Chen, Ziyang Zhang, Jie Liu 0001. TOP: Task-Based Operator Parallelism for Asynchronous Deep Learning Inference on GPU |
282 | -- | 292 | Jing Hou, Guang Chen 0001, Ruiqi Zhang, Zhijun Li 0001, Shangding Gu, Changjun Jiang. Spreeze: High-Throughput Parallel Reinforcement Learning Framework |
293 | -- | 307 | Guangyao Zhou, Wenhong Tian, Rajkumar Buyya, Kui Wu 0001. UMPIPE: Unequal Microbatches-Based Pipeline Parallelism for Deep Neural Network Training |
308 | -- | 325 | Yuyang Jin, Haojie Wang, Xiongchao Tang, Zhenhua Guo 0003, Yaqian Zhao, Torsten Hoefler, Tao Liu 0029, Xu Liu 0001, Jidong Zhai. Leveraging Graph Analysis to Pinpoint Root Causes of Scalability Issues for Parallel Applications |
326 | -- | 340 | Giacomo Valente, Gianluca Brilli, Tania Di Mascio, Alessandro Capotondi, Paolo Burgio, Paolo Valente, Andrea Marongiu. Fine-Grained QoS Control via Tightly-Coupled Bandwidth Monitoring and Regulation for FPGA-Based Heterogeneous SoCs |
341 | -- | 355 | Cristóbal A. Navarro, Felipe A. Quezada, Enzo Meneses, Héctor Ferrada, Nancy Hitschfeld. CAT: Cellular Automata on Tensor Cores |