Journal: IEEE Trans. Parallel Distrib. Syst.

Volume 36, Issue 9

1828 -- 1840Shengwei Li, Zhiquan Lai, Dongsheng Li 0001, Yanqi Hao, Weijie Liu, Keshi Ge, Xiaoge Deng, Kai Lu. Oases: Efficient Large-Scale Model Training on Commodity Servers via Overlapped and Automated Tensor Model Parallelism
1841 -- 1856Zhaoyang Xie, Haibin Zhang, Sisi Duan, Chao Liu 0039, Shengli Liu 0001, Xuanji Meng, Yong Yu 0002, Fangguo Zhang, Boxin Zhao, Liehuang Zhu, Tianqing Zhu. Everything Distributed and Asynchronous: A Practical System for Key Management Service
1857 -- 1871Adrian C. Rublein, Fidan Mehmeti, Mark Mahon, Thomas F. La Porta. Improved Methods of Task Assignment and Resource Allocation With Preemption in Edge Computing Systems
1872 -- 1889Fanxin Li, Shixiong Zhao, Yuhao Qing, Jianyu Jiang, Xusheng Chen, Heming Cui. PipeMesh: Achieving Memory-Efficient Computation-Communication Overlap for Training Large Language Models
1890 -- 1903Long Yuan 0001, Zeyu Zhou, Zi Chen 0003, Xuemin Lin 0001, Xiang Zhao 0002, Fan Zhang 0036. $\text {GPUSCAN}^{++}$: Efficient Structural Graph Clustering on GPUs
1904 -- 1919Yang Bai, Mingjun Li, Wendong Xu, Bei Yu 0001. A Learned Performance Model With Transfer Learning Across GPUs on Tensorized Instructions
1920 -- 1936Hao Zhou, Yuanhui Chen, Wu Zeng, Lixiao Cui, Gang Wang 0001, Xiaoguang Liu 0001. GPComp: Using GPU and SSD-GPU Peer to Peer DMA to Accelerate LSM-Tree Compaction for Key-Value Store
1937 -- 1954Laura Carnevali, Marco Paolieri, Riccardo Reali, Leonardo Scommegna, Enrico Vicario. Compositional Coordinated Resource Provisioning in Workflows With Stochastic Durations
1955 -- 1971Xiaoyong Yan, Fu Xiao 0001, Jian Zhou 0009, Xiulong Liu, Chuntao Ding, Jiannong Cao 0001, Aiguo Song, Alex X. Liu. NDP: Network Division Positioning for Irregular Multi-Hop Networks
1972 -- 1984Ruidong Zhu, Ziyue Jiang 0002, Zhi Zhang, Xin Liu 0086, Xuanzhe Liu, Xin Jin 0008. Cannikin: No Lagger of SLO in Concurrent Multiple LoRA LLM Serving
1985 -- 1997Shengle Lin, Guoqing Xiao 0001, Haotian Wang 0006, Wangdong Yang, Kenli Li 0001, Keqin Li 0001. High Performance OpenCL-Based GEMM Kernel Auto-Tuned by Bayesian Optimization
1998 -- 2013Yinuo Wang, Zeyu Song, Wubing Wan, Xinpeng Zhao, Lin Gan, Ping Gao 0005, Wenqiang Wang, Zhenguo Zhang, Haohuan Fu, Wei Xue 0003, Guangwen Yang. Accelerating Half-Precision Seismic Simulation on Neural Processing Unit
2014 -- 2029Kaiyuan Liu, Xiaobo Zhou 0002, Li Li. m$^{2}$2LLM: A Multi-Dimensional Optimization Framework for LLM Inference on Mobile Devices
2030 -- 2044Oleksandr O. Sudakov, Volodymyr L. Maistrenko. Parallelization of Network Dynamics Computations in Heterogeneous Distributed Environment
2045 -- 2057Giulio Malenza, Valentina Cesare, Marco Edoardo Santimaria, Robert Birke, Alberto Vecchiato, Ugo Becciani, Marco Aldinucci. Performance Portability Assessment in Gaia

Volume 36, Issue 8

1666 -- 1679Mohamed Yassine Boukhari, Akash Balasaheb Dhasade, Anne-Marie Kermarrec, Rafael Pires 0001, Othmane Safsafi, Rishi Sharma 0001. Boosting Resource-Constrained Federated Learning Systems With Guessed Updates
1713 -- 1727Kohei Yoshida, Ryuichi Sakamoto, Kento Sato, Abhinav Bhatele, Hayato Yamaki, Hiroki Honda, Shinobu Miwa. VAHRM: Variation-Aware Resource Management in Heterogeneous Supercomputing Systems
1728 -- 1743Yuhui Zhang, Hong Liao, Lutan Zhao, Yuncong Shao, Zhihong Tian, Xiaofeng Wang, Dan Meng, Rui Hou 0001. An Efficient Speculative Federated Tree Learning System With a Lightweight NN-Based Predictor
1744 -- 1761Rui Zhao 0021, Kui Wang, Yun Li, Yuze Fan, Fei Gao 0020, Zhenhai Gao. Safe Multi-Agent Deep Reinforcement Learning for the Management of Autonomous Connected Vehicles at Future Intersections
1762 -- 1778Lin Qiu, Xing-Wei Wang 0001, Bo Yi, Kaimin Zhang, Fei Gao, Min Huang 0001, Yanpeng Qu. Towards Efficiency and Decentralization: A Blockchain Assisted Distributed Fuzzy-Rough Feature Selection
1779 -- 1796Yuanming Zhang, Pinghui Wang, Kuankuan Cheng, Junzhou Zhao, Jing Tao, Jingxin Hai, Junlan Feng, Chao Deng, Xidian Wang. Building Accurate and Interpretable Online Classifiers on Edge Devices
1797 -- 1809Yaning Yang, Xiaoqi Wang, Chengqing Li, Shaoliang Peng. Parallel Acceleration of Genome Variation Detection on Multi-Zone Heterogeneous System
1810 -- 1827Weiyi Sun, Jianfeng Zhu 0001, Mingyu Gao 0001, Zhaoshi Li, Shaojun Wei, Leibo Liu. SSS-DIMM: Removing Redundant Data Movement in Trusted DIMM-Based Near-Memory-Processing Kernel Offloading via Secure Space Sharing

Volume 36, Issue 7

1354 -- 1371Hui Dou, Mingjie He, Lei Zhang 0183, Yiwen Zhang 0001, Zibin Zheng. CausalConf: Datasize-Aware Configuration Auto-Tuning for Recurring Big Data Processing Jobs via Adaptive Causal Structure Learning
1372 -- 1386Zhaorui Zhang, Sheng Di, Kai Zhao 0008, Sian Jin, Dingwen Tao, Zhuoran Ji, Benben Liu, Khalid Ayedh Alharthi, Jiannong Cao 0001, Franck Cappello. FedCSpc: A Cross-Silo Federated Learning System With Error-Bounded Lossy Parameter Compression
1387 -- 1400Zhibo Xuan, Xin Sun, Xin You, Hailong Yang, Zhongzhi Luan, Yi Liu 0013, Depei Qian. Identifying Performance Inefficiencies of Parallel Program With Spatial and Temporal Trace Analysis
1401 -- 1415Yuan Meng 0001, Mahesh A. Iyer, Viktor K. Prasanna. An Acceleration Framework for Deep Reinforcement Learning Using Heterogeneous Systems
1416 -- 1430Qiang Wang 0005, Zhicheng Li, Fucai Zhou, Jian Xu 0004, Changsheng Zhang 0001. Publicly Verifiable Distributed Computation for MEC Setting
1431 -- 1443Yangjun Wu, Wanlu Cao, Jiacheng Zhao, Honghui Shang. Fast and Scalable Neural Network Quantum States Method for Molecular Potential Energy Surfaces
1444 -- 1459Jiajian Zhang, Fangyu Wu 0001, Hai Jiang 0003, Qiufeng Wang 0001, Genlang Chen, Guangliang Cheng, Eng Gee Lim, Keqin Li 0001. AlignMalloc: Warp-Aware Memory Rearrangement Aligned With UVM Prefetching for Large-Scale GPU Dynamic Allocations
1460 -- 1477Na Wang 0003, Kaifa Zheng, Wen Zhou, Jianwei Liu 0001, Lunzhi Deng, Junsong Fu 0001. A Lightweight and Fine-Grained Ciphertext Search Scheme for Big Data Assisted by Proxy Servers
1478 -- 1494Wanqi Yang, Pengfei Chen 0002, Kai Liu, Huxing Zhang. ZeroTracer: In-Band eBPF-Based Trace Generator With Zero Instrumentation for Microservice Systems
1495 -- 1508Qingcai Jiang, Zhenwei Cao, Junshi Chen, Xinming Qin, Wei Hu 0006, Hong An, Jinlong Yang 0003. PWDFT-SW: Extending the Limit of Plane-Wave DFT Calculations to 16K Atoms on the New Sunway Supercomputer
1509 -- 1523Kechang Yang, Biao Hu 0001, Mingguo Zhao. Coordinating Computational Capacity for Adaptive Federated Learning in Heterogeneous Edge Computing Systems
1524 -- 1541Fangyu Zheng, Guang Fan, Wenxu Tang, Yixuan Song, Tian Zhou, Yuan Zhao, Jiankuo Dong, Jingqiang Lin 0001, Shoumeng Yan, Jiwu Jing. GIF-FHE: A Comprehensive Implementation and Evaluation of GPU-Accelerated FHE With Integer and Floating-Point Computing Power
1542 -- 1559Pengmiao Zhang, Rajgopal Kannan, Viktor K. Prasanna. GraFetch: Accelerating Graph Applications Through Domain Specific Hierarchical Hybrid Prefetching
1560 -- 1573He Zhu, Mingyu Li, Haihang You. RHINO: An Efficient Serverless Container System for Small-Scale HPC Applications
1574 -- 1590Wenhao Lu, Zhiyuan Wang, Hefan Zhang, Shan Zhang 0001, Hongbin Luo. OpenSN: An Open Source Library for Emulating LEO Satellite Networks
1591 -- 1607Zijie Liu, Yi Cheng, Can Chen, Jun Hu, Rongguo Fu, Dengyin Zhang. ISACPP: Interference-Aware Scheduling Approach for Deep Learning Training Workloads Based on Co-Location Performance Prediction
1608 -- 1619Mimi Qian, Lin Cui 0001, Xiaoquan Zhang, Fung Po Tso 0001, Yuhui Deng, Zhetao Li, Weijia Jia 0001. DisPLOY: Target-Constrained Distributed Deployment for Network Measurement Tasks on Data Plane
1620 -- 1633Stefan Popa, Vlad Petric, Mihai Ivanovici. A Highly-Parallel and Scalable Hardware Accelerator for the NTest Othello Game Engine
1634 -- 1650Xuyang Liu, Zijian Zhang 0001, Zhen Li, Hao Yin, Meng Li 0006, Jiamou Liu, Mauro Conti, Liehuang Zhu. ABSE: Adaptive Baseline Score-Based Election for Leader-Based BFT Systems
1651 -- 1665Xingguo Pang, Liu Liu, Yanze Zhang, Zhuofu Chen, Zhijun Ding, Dazhao Cheng, Xiaobo Zhou 0002. Featherlight Stateful WebAssembly for Serverless Inference Workflows
1680 -- 1694Guangyao Zhou, Yiqin Fu, Haocheng Lan, Yuanlun Xie, Wenhong Tian, Rajkumar Buyya, Jianhong Qian, Teng Su. Cross-Search With Improved Multi-Dimensional Dichotomy-Based Joint Optimization for Distributed Parallel Training of DNN
1695 -- 1712Hao Wu, Shiyi Wang, Youhui Bai, Cheng Li 0001, Quan Zhou, Jun Yi, Feng Yan 0001, Ruichuan Chen, Yinlong Xu. A Generic, High-Performance, Compression-Aware Framework for Data Parallel DNN Training

Volume 36, Issue 6

1058 -- 1070Yuan Yao 0004, Yujiao Hu, Yi Dang, Wei Tao, Kai Hu, Qiming Huang, Zhe Peng, Gang Yang 0008, Xingshe Zhou 0001. Workload-Aware Performance Model Based Soft Preemptive Real-Time Scheduling for Neural Processing Units
1071 -- 1086Wei Gao, Zhuoyuan Ouyang, Peng Sun 0006, Tianwei Zhang 0004, Yonggang Wen 0001. IceFrog: A Layer-Elastic Scheduling System for Deep Learning Training in GPU Clusters
1087 -- 1099Wenting Wei, Huaxi Gu, Zhe Xiao, Yi Chen. Energy Efficient and Multi-Resource Optimization for Virtual Machine Placement by Improving MOEA/D
1100 -- 1114Wenming Li, Zhihua Fan, Tianyu Liu, Zhen Wang, Haibin Wu, Meng Wu 0006, Kunming Zhang, Yanhuan Liu, Ninghui Sun, Xiaochun Ye, Dongrui Fan. DFU-E: A Dataflow Architecture for Edge DSP and AI Applications
1115 -- 1129Yifeng Tang, Huaman Zhou, Zhuoran Ji, Cho-Li Wang. Cube-fx: Mapping Taylor Expansion Onto Matrix Multiplier-Accumulators of Huawei Ascend AI Processors
1130 -- 1145William Andrew Simon, Irem Boybat, Riselda Kodra, Elena Ferro, Gagandeep Singh 0002, Mohammed Alser, Shubham Jain, Hsinyu Tsai, Geoffrey W. Burr, Onur Mutlu, Abu Sebastian. CiMBA: Accelerating Genome Sequencing Through On-Device Basecalling via Compute-in-Memory
1146 -- 1160Wei Zhang 0173, Yunlong Yu, Xiao Jiang, Nan Guan, Naijun Zhan, Lei Ju 0001. WCET Estimation for CNN Inference on FPGA SoC With Multi-DPU Engines
1161 -- 1174Haobin Tan, Yao Xiao, Amelie Chi Zhou, Kezhong Lu, Xuan Yang. Distributed and Adaptive Partitioning for Large Graphs in Geo-Distributed Data Centers
1175 -- 1192Kumseok Jung, Julien Gascon-Samson, Sathish Gopalakrishnan, Karthik Pattabiraman. OneOS: Distributed Operating System for the Edge-to-Cloud Continuum
1193 -- 1205Luca Colagrande, Luca Benini. Taming Offload Overheads in a Massively Parallel Open-Source RISC-V MPSoC: Analysis and Optimization
1206 -- 1219Somesh Singh 0001, Bora Uçar. Efficient Parallel Sparse Tensor Contraction
1220 -- 1236Guohua Xin, Guangquan Xu, Yao Zhang, Cheng Wen 0002, Cen Zhang, Xiaofei Xie, Neal N. Xiong, Shaoying Liu, Pan Gao. IRHunter: Universal Detection of Instruction Reordering Vulnerabilities for Enhanced Concurrency in Distributed and Parallel Systems
1237 -- 1252Thomas W. Pusztai, Stefan Nastic. ChunkFunc: Dynamic SLO-Aware Configuration of Serverless Functions
1253 -- 1267Sahil Tyagi, Prateek Sharma. OmniLearn: A Framework for Distributed Deep Learning Over Heterogeneous Clusters
1268 -- 1281Zhengyu Liao, Shiyou Qian, Zhonglong Zheng, Jian Cao 0001, Guangtao Xue, Minglu Li 0001. $AWB^+$AWB+-$Tree$Tree: A Novel Width-Based Index Structure Supporting Hybrid Matching for Large-Scale Content-Based Pub/Sub Systems
1282 -- 1293Huazhong Lü, Kai Deng, Xiaomei Yang. Symmetric Properties and Two Variants of Shuffle-Cubes
1294 -- 1310Xiangyu Kong, Yi Huang, Longlong Chen, Jianfeng Zhu 0001, Liangwei Li, Xingchen Man, Mingyu Gao 0001, Shaojun Wei, Leibo Liu. Raccoon: Lightweight Support for Comprehensive Control Flows in Reconfigurable Spatial Architectures
1311 -- 1325Laleh Ghalami, Daniel Grosu. Parallel Greedy Algorithms for Steiner Forest
1326 -- 1337Yuxia Cheng, Linfeng Xu, Tongkai Yang, Wei Wu, Zhiqiang Lin, Antong Yu, Wenzhi Chen. Beehive: Decentralised High-Frequency Small Tasks Scheduling in Large Clusters
1338 -- 1353Xue Jiang, Hengfeng Wei, Yu Huang 0002, Yuxing Chen, Anqun Pan. A Generic Specification Framework for Weakly Consistent Replicated Data Types

Volume 36, Issue 5

803 -- 0Omer F. Rana, Josef Spillner, Stephen Leak, Gerald F. Lofstead II, Rafael Tolosana-Calasanz. Guest Editorial:Special Section on SC22 Student Cluster Competition
804 -- 820Alexandros Nikolaos Ziogas, Timo Schneider, Tal Ben-Nun, Alexandru Calotoiu, Tiziano De Matteis, Johannes de Fine Licht, Luca Lavarini, Torsten Hoefler. Productivity, Portability, Performance, and Reproducibility: Data-Centric Python
821 -- 825Fu-Chiang Chang, En-Ming Huang, Pin-Yi Kuo, Chan-Yu Mou, Hsu-Tzu Ting, Pang-Ning Wu, Jerry Chou 0001. Reproducing Performance of Data-Centric Python by SCC Team From National Tsing Hua University
826 -- 829Zihan Yang, Yi Chen, Kaiqi Chen, Xingjian Qian, Shaojun Xu, Yun Pan, Chong Zeng 0001, Jianhai Chen, Yin Zhang, Zeke Wang. Critique of "Productivity, Portability, Performance: Data-Centric Python" by SCC Team From Zhejiang University
830 -- 834Han Huang, Tengyang Zheng, Tianxing Yang, Yang Ye, Siran Liu, Zhe Tang, Shengyou Lu, Guangnan Feng, Zhiguang Chen 0001, Dan Huang 0001. Critique of "Productivity, Portability, Performance Data-Centric Python" by SCC Team From Sun Yat-sen University
835 -- 840Christopher Lompa, Piotr Luczynski. Analysis and Reproducibility of "Productivity, Portability, Performance: Data-Centric Python"
841 -- 846Anish Govind, Yuchen Jing, Stefanie Dao, Michael Granado, Rachel Handran, Davit Margarian, Matthew Mikhailov, Danny Vo, Matei-Alexandru Gardus, Khai Vu, Derek Bouius, Bryan Chin, Mahidhar Tatineni, Mary P. Thomas. Reproducibility of the DaCe Framework on NPBench Benchmarks
847 -- 860Yuan Gao, Liquan Chen, Jianchang Lai, Tianyi Wang 0006, Xiaoming Wu, Shui Yu 0001. IoT-Dedup: Device Relationship-Based IoT Data Deduplication Scheme
861 -- 876Zhaochen Zhang, Xu Zhang 0006, Zhaoxiang Bao, Liang Wei, Chaohong Tan, Wanchun Dou, Guihai Chen, Chen Tian 0001. Courier: A Unified Communication Agent to Support Concurrent Flow Scheduling in Cluster Computing
877 -- 888Conor John Williams, James Elliott. Libfork: Portable Continuation-Stealing With Stackless Coroutines
889 -- 902Keyun Cheng, Huancheng Puyang, Xiaolu Li 0002, Patrick P. C. Lee, Yuchong Hu, Jie Li 0019, Ting-Yi Wu. Toward Load-Balanced Redundancy Transitioning for Erasure-Coded Storage
903 -- 917Junhan Liu, Zinuo Cai, Yumou Liu, Hao Li, Zongpu Zhang, Ruhui Ma, Rajkumar Buyya. SMore: Enhancing GPU Utilization in Deep Learning Clusters by Serverless-Based Co-Location Scheduling
918 -- 931Hyeonjin Kim, Taesoo Lim, William J. Song. Graphite: Hardware-Aware GNN Reshaping for Acceleration With GPU Tensor Cores
932 -- 944S. M. Shovan, Arindam Khanda, Sajal K. Das 0001. Parallel Multi Objective Shortest Path Update Algorithm in Large Dynamic Networks
945 -- 960Xiangyu Zou, Wen Xia, Philip Shilane, Haijun Zhang 0002, Xuan Wang 0002. The Design of a High-Performance Fine-Grained Deduplication Framework for Backup Storage
961 -- 976Qiange Wang, Xin Ai 0006, Yongze Yan, Shufeng Gong, Yanfeng Zhang, Jing Chen, Ge Yu 0001. Towards Communication-Efficient Out-of-Core Graph Processing on the GPU
977 -- 993Huijing Yang, Juan Fang, Yumin Hou, Xing Su 0001, Neal N. Xiong. Reinforcement Learning-Driven Adaptive Prefetch Aggressiveness Control for Enhanced Performance in Parallel System Architectures
994 -- 1010Zerui Shao, Beibei Li 0002, Peiran Wang, Yi Zhang 0018, Kim-Kwang Raymond Choo. FedLoRE: Communication-Efficient and Personalized Edge Intelligence Framework via Federated Low-Rank Estimation
1011 -- 1024Jingweijia Tan, Xurui Li, an Zhong, Kaige Yan, Xiaohui Wei 0002, Guanpeng Li. GEREM: Fast and Precise Error Resilience Assessment for GPU Microarchitectures
1025 -- 1041Jan Laukemann, Ahmed E. Helal, S. Isaac Geronimo Anderson, Fabio Checconi, Yongseok Soh, Jesmin Jahan Tithi, Teresa M. Ranadive, Brian J. Gravelle, Fabrizio Petrini, Jee W. Choi. Accelerating Sparse Tensor Decomposition Using Adaptive Linearized Representation
1042 -- 1057Weihan Kong, Shengan Zheng, Yifan Hua, Ruoyan Ma, Yuheng Wen, Guifeng Wang, Cong Zhou, Linpeng Huang. PimBeam: Efficient Regular Path Queries Over Graph Database Using Processing-in-Memory

Volume 36, Issue 4

616 -- 632Junhee Ryu, Dongeun Lee 0001, Kang G. Shin, Kyungtae Kang. Paralfetch: Fast Application Launch on Personal Computing/Communication Devices
645 -- 658Yi Chen, Qiang-Sheng Hua, Zixiao Hong, Lin Zhu, Hai Jin 0001. FHE4DMM: A Low-Latency Distributed Matrix Multiplication With Fully Homomorphic Encryption
675 -- 676Zhengjun Cao. A Note on "AESM2 Attribute-Based Encrypted Search for Multi-Owner and Multi-User Distributed Systems"
677 -- 688Yan Zeng, Chengchuang Huang, Yipeng Mei, Lifu Zhang, Teng Su, Wei Ye, Wenqi Shi, Shengnan Wang. EfficientMoE: Optimizing Mixture-of-Experts Model Training With Adaptive Load Balance
689 -- 700Luiz Gustavo Coutinho Xavier, Cristina Meinhardt, Odorico Machado Mendizabal. Beelog: Online Log Compaction for Dependable Systems

Volume 36, Issue 3

361 -- 376Shuaibing Lu, Ran Yan, Jie Wu 0001, Jackson Yang, Xinyu Deng, Shen Wu, Zhi Cai, Juan Fang. Online Elastic Resource Provisioning With QoS Guarantee in Container-Based Cloud Computing
377 -- 390Junyuan Liang, Peiyuan Yao, Wuhui Chen, Zicong Hong, Jianting Zhang, Ting Cai, Min Sun, Zibin Zheng. Sparrow: Expediting Smart Contract Execution for Blockchain Sharding via Inter-Shard Caching
391 -- 406Jialun Li, Danyang Xiao, Diying Yang, Xuan Mo, Weigang Wu. Integrated and Fungible Scheduling of Deep Learning Workloads Using Multi-Agent Reinforcement Learning
407 -- 421Saiman Dahal, Pratyush Dhingra, Krishu K. Thapa, Partha Pratim Pande, Ananth Kalyanaraman. HpT: Hybrid Acceleration of Spatio-Temporal Attention Model Training on Heterogeneous Manycore Architectures
422 -- 436Yuhan Leng, Gaoyuan Zou, Hansheng Wang, Panruo Wu, Shaoshuai Zhang. High Performance Householder QR Factorization on Emerging GPU Architectures Using Tensor Cores
437 -- 454Lizhen Zhou, Zichuan Xu, Qiufen Xia, Zhou Xu, Wenhao Ren, Wenbo Qi, Jinjing Ma, Song Yan, Yuan Yang. Chasing Common Knowledge: Joint Large Model Selection and Pulling in MEC With Parameter Sharing
455 -- 470Binqi Sun, Tomasz Kloda, Jiyang Chen, Cen Lu, Marco Caccamo. Response Time Analysis and Optimal Priority Assignment for Global Non-Preemptive Fixed-Priority Rigid Gang Scheduling
471 -- 486Ziqu Yu, Jinyu Gu 0001, Zijian Wu, Nian Liu, Jian Guo. HTLL: Latency-Aware Scalable Blocking Mutex
487 -- 501Haining Yang, Dengguo Feng, Jing Qin 0002. Towards Efficient Verifiable Cloud Storage and Distribution for Large-Scale Data Streaming
502 -- 519Hongkuan Zhou, Bingyi Zhang, Rajgopal Kannan, Carl E. Busart, Viktor K. Prasanna. ViTeGNN: Towards Versatile Inference of Temporal Graph Neural Networks on FPGA
520 -- 536Wenhan Xu, Hui Ma 0002, Rui Zhang 0002, Jianhao Li. $ \mathsf{GPABE} $GPABE: GPU-Based Parallelization Framework for Attribute-Based Encryption Schemes

Volume 36, Issue 2

108 -- 119Hariharan Devarajan, Gerd Heber, Kathryn M. Mohror. H5Intent: Autotuning HDF5 With User Intent
120 -- 132Diletta Olliaro, Adityo Anggraito, Marco Ajmone Marsan, Simonetta Balsamo, Andrea Marin. The Impact of Service Demand Variability on Data Center Performance
133 -- 149Shuai Lin, Rui Wang 0076, Yongkun Li 0001, Yinlong Xu, John C. S. Lui. Two-Dimensional Balanced Partitioning and Efficient Caching for Distributed Graph Analysis
150 -- 167Zhi Ling, Xiaofeng Jiang, Xiaobin Tan, Huasen He, Shiyin Zhu, Jian Yang 0014. Joint Dynamic Data and Model Parallelism for Distributed Training of DNNs Over Heterogeneous Infrastructure
168 -- 184Diandian Gu, Yihao Zhao, Peng Sun 0006, Xin Jin 0008, Xuanzhe Liu. GreenFlow: A Carbon-Efficient Scheduler for Deep Learning Workloads
185 -- 196Pengwei Wang 0001, Junye Qiao, Yuying Zhao, Zhijun Ding. Cost-Effective and Low-Latency Data Placement in Edge Environment Based on PageRank-Inspired Regional Value
197 -- 211Xiaodong Dong, Lihai Nie, Zheli Liu, Yang Xiang 0001. Slark: A Performance Robust Decentralized Inter-Datacenter Deadline-Aware Coflows Scheduling Framework With Local Information
212 -- 225Jialiang Han, Yudong Han, Xiang Jing, Gang Huang 0001, Yun Ma 0002. DegaFL: Decentralized Gradient Aggregation for Cross-Silo Federated Learning
226 -- 238Zhongyi Lin, Ning Sun, Pallab Bhattacharya, Xizhou Feng, Louis Feng, John D. Owens. Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms
239 -- 252Zhangrong Qin, Xusheng Lu, Long Lv, Zhongxiang Tang, Binghai Wen. An Efficient GPU Algorithm for Lattice Boltzmann Method on Sparse Complex Geometries
253 -- 265J. Gregory Pauloski, Valérie Hayot-Sasson, Logan T. Ward, Alexander Brace, André Bauer 0001, Kyle Chard, Ian T. Foster. Object Proxy Patterns for Accelerating Distributed Applications
266 -- 281Changyao Lin, Zhenming Chen, Ziyang Zhang, Jie Liu 0001. TOP: Task-Based Operator Parallelism for Asynchronous Deep Learning Inference on GPU
282 -- 292Jing Hou, Guang Chen 0001, Ruiqi Zhang, Zhijun Li 0001, Shangding Gu, Changjun Jiang. Spreeze: High-Throughput Parallel Reinforcement Learning Framework
293 -- 307Guangyao Zhou, Wenhong Tian, Rajkumar Buyya, Kui Wu 0001. UMPIPE: Unequal Microbatches-Based Pipeline Parallelism for Deep Neural Network Training
308 -- 325Yuyang Jin, Haojie Wang, Xiongchao Tang, Zhenhua Guo 0003, Yaqian Zhao, Torsten Hoefler, Tao Liu 0029, Xu Liu 0001, Jidong Zhai. Leveraging Graph Analysis to Pinpoint Root Causes of Scalability Issues for Parallel Applications
326 -- 340Giacomo Valente, Gianluca Brilli, Tania Di Mascio, Alessandro Capotondi, Paolo Burgio, Paolo Valente, Andrea Marongiu. Fine-Grained QoS Control via Tightly-Coupled Bandwidth Monitoring and Regulation for FPGA-Based Heterogeneous SoCs
341 -- 355Cristóbal A. Navarro, Felipe A. Quezada, Enzo Meneses, Héctor Ferrada, Nancy Hitschfeld. CAT: Cellular Automata on Tensor Cores

Volume 36, Issue 10

2058 -- 2072Weimin Li 0002, Qin Li, Weihong Tian, Jie Gao 0002, Fan Wu 0014, Jianxun Liu 0001, Ju Ren 0001. MUCVR: Edge Computing-Enabled High-Quality Multi-User Collaboration for Interactive MVR
2073 -- 2088Yujian Wu, Shanjiang Tang, Ce Yu, Bin Yang 0043, Chao Sun, Jian Xiao 0001, Hutong Wu, Jinghua Feng. Task Scheduling in Geo-Distributed Computing: A Survey
2089 -- 2103Bin Deng, Weidong Li 0002. Dynamic Multiresource Fair Allocation With Time Discount Utility
2104 -- 2118Yepeng Zhang, Haitao Zhang, Huadong Ma. RL-Based Hybrid CPU Scaling for Soft Deadline Constrained Tasks in Container Clouds
2119 -- 2136Yishan Chen 0001, Xiangwei Zeng, Huashuai Cai, Qing Xu, Zhiquan Liu. Decentralized QoS-Aware Model Inference Using Federated Split Learning for Cloud-Edge Medical Detection

Volume 36, Issue 1

1 -- 14Junqiang Jiang, Zhifang Sun, Ruiqi Lu, Li Pan, Zebo Peng. Real Relative Encoding Genetic Algorithm for Workflow Scheduling in Heterogeneous Distributed Computing Systems
15 -- 28Sanaz Rabinia, Niloofar Didar, Marco Brocanelli, Daniel Grosu. Algorithms for Data Sharing-Aware Task Allocation in Edge Computing Systems
29 -- 42Qiang He 0001, Guobiao Zhang, Jiawei Wang 0003, Ruikun Luo, Xiaohai Dai, Yuchong Hu, Feifei Chen 0001, Hai Jin 0001, Yun Yang 0001. EdgeHydra: Fault-Tolerant Edge Data Distribution Based on Erasure Coding
43 -- 54Jonatha Anselmi, Josu Doncel. Balanced Splitting: A Framework for Achieving Zero-Wait in the Multiserver-Job Model
55 -- 66Ruikun Luo, Qiang He 0001, Feifei Chen 0001, Song Wu 0001, Hai Jin 0001, Yun Yang 0001. Ripple: Enabling Decentralized Data Deduplication at the Edge
67 -- 83Haoyu Liao, Tong-Yu Liu, Jianmei Guo, Bo Huang 0002, Dingyu Yang, Jonathan Ding. Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data Centers
84 -- 95Ruikun Luo, Qiang He 0001, Mengxi Xu, Feifei Chen 0001, Song Wu 0001, Jing Yang, Yuan Gao, Hai Jin 0001. Edge Data Deduplication Under Uncertainties: A Robust Optimization Approach
96 -- 107Guillaume Raffin, Denis Trystram. Dissecting the Software-Based Measurement of CPU Energy Consumption: A Comparative Analysis