Journal: IEEE Trans. Parallel Distrib. Syst.

Volume 36, Issue 2

108 -- 119Hariharan Devarajan, Gerd Heber, Kathryn M. Mohror. H5Intent: Autotuning HDF5 With User Intent
120 -- 132Diletta Olliaro, Adityo Anggraito, Marco Ajmone Marsan, Simonetta Balsamo, Andrea Marin. The Impact of Service Demand Variability on Data Center Performance
133 -- 149Shuai Lin, Rui Wang 0076, Yongkun Li 0001, Yinlong Xu, John C. S. Lui. Two-Dimensional Balanced Partitioning and Efficient Caching for Distributed Graph Analysis
150 -- 167Zhi Ling, Xiaofeng Jiang, Xiaobin Tan, Huasen He, Shiyin Zhu, Jian Yang 0014. Joint Dynamic Data and Model Parallelism for Distributed Training of DNNs Over Heterogeneous Infrastructure
168 -- 184Diandian Gu, Yihao Zhao, Peng Sun 0006, Xin Jin 0008, Xuanzhe Liu. GreenFlow: A Carbon-Efficient Scheduler for Deep Learning Workloads
185 -- 196Pengwei Wang 0001, Junye Qiao, Yuying Zhao, Zhijun Ding. Cost-Effective and Low-Latency Data Placement in Edge Environment Based on PageRank-Inspired Regional Value
197 -- 211Xiaodong Dong, Lihai Nie, Zheli Liu, Yang Xiang 0001. Slark: A Performance Robust Decentralized Inter-Datacenter Deadline-Aware Coflows Scheduling Framework With Local Information
212 -- 225Jialiang Han, Yudong Han, Xiang Jing, Gang Huang 0001, Yun Ma 0002. DegaFL: Decentralized Gradient Aggregation for Cross-Silo Federated Learning
226 -- 238Zhongyi Lin, Ning Sun, Pallab Bhattacharya, Xizhou Feng, Louis Feng, John D. Owens. Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms
239 -- 252Zhangrong Qin, Xusheng Lu, Long Lv, Zhongxiang Tang, Binghai Wen. An Efficient GPU Algorithm for Lattice Boltzmann Method on Sparse Complex Geometries
253 -- 265J. Gregory Pauloski, Valérie Hayot-Sasson, Logan T. Ward, Alexander Brace, André Bauer 0001, Kyle Chard, Ian T. Foster. Object Proxy Patterns for Accelerating Distributed Applications
266 -- 281Changyao Lin, Zhenming Chen, Ziyang Zhang, Jie Liu 0001. TOP: Task-Based Operator Parallelism for Asynchronous Deep Learning Inference on GPU
282 -- 292Jing Hou, Guang Chen 0001, Ruiqi Zhang, Zhijun Li 0001, Shangding Gu, Changjun Jiang. Spreeze: High-Throughput Parallel Reinforcement Learning Framework
293 -- 307Guangyao Zhou, Wenhong Tian, Rajkumar Buyya, Kui Wu 0001. UMPIPE: Unequal Microbatches-Based Pipeline Parallelism for Deep Neural Network Training
308 -- 325Yuyang Jin, Haojie Wang, Xiongchao Tang, Zhenhua Guo 0003, Yaqian Zhao, Torsten Hoefler, Tao Liu 0029, Xu Liu 0001, Jidong Zhai. Leveraging Graph Analysis to Pinpoint Root Causes of Scalability Issues for Parallel Applications
326 -- 340Giacomo Valente, Gianluca Brilli, Tania Di Mascio, Alessandro Capotondi, Paolo Burgio, Paolo Valente, Andrea Marongiu. Fine-Grained QoS Control via Tightly-Coupled Bandwidth Monitoring and Regulation for FPGA-Based Heterogeneous SoCs
341 -- 355Cristóbal A. Navarro, Felipe A. Quezada, Enzo Meneses, Héctor Ferrada, Nancy Hitschfeld. CAT: Cellular Automata on Tensor Cores