Abstract is missing.
- FAIR-BFL: Flexible and Incentive Redesign for Blockchain-based Federated LearningRongxin Xu, Shiva Raj Pokhrel, Qiujun Lan, Gang Li 0009. [doi]
- IATF: An Input-Aware Tuning Framework for Compact BLAS Based on ARMv8 CPUsCunyang Wei, Haipeng Jia, Yunquan Zhang, Liusha Xu, Ji Qi. [doi]
- An Online Learning Approach for Client Selection in Federated Edge Learning under Budget ConstraintLina Su, Ruiting Zhou, Ne Wang, Guang Fang, Zongpeng Li. [doi]
- EmbRace: Accelerating Sparse Communication for Distributed Training of Deep Neural NetworksShengwei Li, Zhiquan Lai, Dongsheng Li, Yiming Zhang 0003, Xiangyu Ye, Yabo Duan. [doi]
- Postmortem Computation of Pagerank on Temporal GraphsMd. Maruf Hossain, Erik Saule. [doi]
- Automatic Differentiation of Parallel Loops with Formal MethodsJan Hückelheim, Laurent Hascoët. [doi]
- A Dynamic and Recoverable BMT Scheme for Secure Non-Volatile MemoryMengya Lei, Fang Wang, Dan Feng 0001, Xiaoyu Shuai, Yuchao Cao. [doi]
- EasyView: Enabling and Scheduling Tensor Views in Deep Learning CompilersLijuan Jiang, Ping Xu, Qianchao Zhu, Xiuhong Li, Shengen Yan, Xingcheng Zhang, Dahua Lin, Wenjing Ma, Zhouyang Li, Jun Liu, Jinming Ma, Minxi Jin, Chao Yang. [doi]
- ROWE-tree: A Read-Optimized and Write-Efficient B+-tree for Persistent MemoryXiaomin Zou, Fang Wang 0001, Dan Feng 0001, Tianjin Guan, Nan Su. [doi]
- A single-tree algorithm to compute the Euclidean minimum spanning tree on GPUsAndrey Prokopenko, Piyush Sao, Damien Lebrun-Grandié. [doi]
- Accelerating Parallel First-Principles Excited-State Calculation by Low-Rank Approximation with K-Means ClusteringQingcai Jiang, Jielan Li, Junshi Chen, Xinming Qin, Lingyun Wan, Jinlong Yang, Jie Liu, Wei Hu, Hong An. [doi]
- On the Parallelization of MCMC for Community DetectionFrank Wanye, Vitaliy Gleyzer, Edward K. Kao, Wu-chun Feng. [doi]
- Tesseract: Parallelize the Tensor Parallelism EfficientlyBoxiang Wang, Qifan Xu, Zhengda Bian, Yang You 0001. [doi]
- FedClassAvg: Local Representation Learning for Personalized Federated Learning on Heterogeneous Neural NetworksJaehee Jang, Heonseok Ha, Dahuin Jung, Sungroh Yoon. [doi]
- TCB: Accelerating Transformer Inference Services with Request ConcatenationBoqian Fu, Fahao Chen, Peng Li 0017, Deze Zeng. [doi]
- FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated LearningNang Hung Nguyen, Phi-Le Nguyen, Thuy Dung Nguyen, Trung Thanh Nguyen, Duc-Long Nguyen, Thanh-Hung Nguyen, Huy-Hieu Pham, Truong Thao Nguyen. [doi]
- Transparent load balancing of MPI programs using [email protected] and DLBJimmy Aguilar Mena, Omar Shaaban, Victor Lopez, Marta Garcia, Paul M. Carpenter, Eduard Ayguadé, Jesús Labarta. [doi]
- Adaptive and Efficient GPU Time Sharing for Hyperparameter Tuning in CloudLiu Liu, Jian Yu, Zhijun Ding. [doi]
- Energy-efficient Edge Server Management for Edge Computing: A Game-theoretical ApproachGuangming Cui, Qiang He 0001, Xiaoyu Xia, Feifei Chen 0001, Yun Yang 0001. [doi]
- Parallel Algorithms for Masked Sparse Matrix-Matrix ProductsSrdan Milakovic, Oguz Selvitopi, Israt Nisa, Zoran Budimlic, Aydin Buluç. [doi]
- FedHiSyn: A Hierarchical Synchronous Federated Learning Framework for Resource and Data HeterogeneityGuanghao Li, Yue Hu, Miao Zhang, Ji Liu, Quanjun Yin, Yong Peng 0006, Dejing Dou. [doi]
- Micro-Benchmarking MPI Partitioned Point-to-Point CommunicationYiltan Hassan Temuçin, Ryan E. Grant, Ahmad Afsahi. [doi]
- Enabling Latency-Sensitive DNN Inference via Joint Optimization of Model Surgery and Resource Allocation in Heterogeneous EdgeZhaowu Huang, Fang Dong 0001, Dian Shen, Huitian Wang, Xiaolin Guo, Shucun Fu. [doi]
- Lobster: Load Balance-Aware I/O for Distributed DNN TrainingJie Liu, Bogdan Nicolae, Dong Li 0001. [doi]
- Automatically Generating High-performance Matrix Multiplication Kernels on the Latest Sunway ProcessorXiaohan Tao, Yu Zhu, Boyang Wang, Jinlong Xu, Jianmin Pang, Jie Zhao. [doi]
- HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler for Neural NetworksZining Zhang 0001, Bingsheng He, Zhenjie Zhang. [doi]
- Regularizing Sparse and Imbalanced Communications for Voxel-based Brain Simulations on SupercomputersYuhao Liu, Xin Du, Zhihui Lu 0002, Qiang Duan, Jianfeng Feng, Minglong Wang, Jie Wu 0003. [doi]
- Highly Parallel Linear Forest Extraction from a Weighted Graph on GPUsChristoph Klein 0002, Robert Strzodka. [doi]
- Aperiodic Local SGD: Beyond Local SGDHao Zhang, Tingting Wu, Siyao Cheng, Jie Liu 0001. [doi]
- A Data-aware Learned Index Scheme for Efficient WritesLi Liu, Chunhua Li, Zhou Zhang, Yuhan Liu, Ke Zhou, Ji Zhang. [doi]
- Learning Mean-Field Control for Delayed Information Load Balancing in Large Queuing SystemsAnam Tahir, Kai Cui 0001, Heinz Koeppl. [doi]
- Simmer: Rate proportional scheduling to reduce packet drops in vGPU based NF chainsAvinash Kumar Chaurasia, Anshuj Garg, Bhaskaran Raman, Uday Kurkure, Hari Sivaraman, Lan Vu, Sairam Veeraswamy. [doi]
- MG-GCN: A Scalable multi-GPU GCN Training FrameworkMuhammed Fatih Balin, Kaan Sancak, Ümit V. Çatalyürek. [doi]
- Analyzing Performance and Power-Efficiency Variations among NVIDIA GPUsKohei Yoshida, Rio Sageyama, Shinobu Miwa, Hayato Yamaki, Hiroki Honda. [doi]
- Cache-Poll: Containing Pollution in Non-Inclusive Caches Through Cache PartitioningLucia Pons, Julio Sahuquillo, Salvador Petit, Julio Pons. [doi]
- TileSpMSpV: A Tiled Algorithm for Sparse Matrix-Sparse Vector Multiplication on GPUsHaonan Ji, Huimin Song, Shibo Lu, Zhou Jin 0001, Guangming Tan, Weifeng Liu 0002. [doi]
- UA-Sketch: An Accurate Approach to Detect Heavy Flow based on Uninterrupted ArrivalJin Ye, Lin Li, Wenlu Zhang, Guihao Chen, Yuanchao Shan, Yijun Li, Weihe Li, Jiawei Huang 0001. [doi]
- ParaGraph: An application-simulator interface and toolkit for hardware-software co-designMikhail Isaev, Nic McDonald, Jeffrey Young, Richard Vuduc. [doi]
- Themis: Fair Memory Subsystem Resource Sharing with Differentiated QoS in Public CloudsWenda Tang, Senbo Fu, Yutao Ke, Qian Peng, Feng Gao. [doi]
- Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome AssemblyGiulia Guidi, Gabriel Raulet, Daniel Rokhsar, Leonid Oliker, Katherine A. Yelick, Aydin Buluç. [doi]
- Online Scheduling of Moldable Task Graphs under Common Speedup ModelsAnne Benoit, Lucas Perotin, Yves Robert, Hongyang Sun. [doi]
- Penelope: Peer-to-peer Power ManagementTapan Srivastava, Huazhe Zhang, Henry Hoffmann. [doi]
- Vectorizing SpMV by Exploiting Dynamic Regular PatternsXin You, Changxi Liu, Hailong Yang, Pengbo Wang, Zhongzhi Luan, Depei Qian. [doi]
- Boosting Cross-rack Multi-stripe Repair in Heterogeneous Erasure-coded ClustersHai Zhou, Dan Feng. [doi]
- Efficient Phase-Functioned Real-time Character Control in Mobile Games: A TVM Enabled ApproachHaidong Lan, Wenxi Zhu, Du Wu, Qian Qiu, Honglin Zhu, Jingjing Zhao, Xinghui Fu, Liu Wei, Jintao Meng, Minwen Deng. [doi]
- Formulating Interference-aware Data Delivery Strategies in Edge Storage SystemsXiaoyu Xia, Feifei Chen 0001, Qiang He, Guangming Cui, John C. Grundy, Mohamed Almorsy Abdelrazek, Fang Dong. [doi]
- Towards Fast Large-scale Graph Analysis via Two-dimensional Balanced PartitioningShuai Lin, Rui Wang, Yongkun Li, Yinlong Xu, John C. S. Lui, Fei Chen, Pengcheng Wang, Lei Han. [doi]
- HSP: Hybrid Synchronous Parallelism for Fast Distributed Deep LearningYijun Li, Jiawei Huang 0001, Zhaoyi Li, Shengwen Zhou, Wanchun Jiang, Jianxin Wang 0001. [doi]
- Characterizing Job Microarchitectural Profiles at Scale: Dataset and AnalysisKangjin Wang, Ying Li, Cheng Wang, Tong Jia, Kingsum Chow, Yang Wen, Yaoyong Dou, Guoyao Xu, Chuanjia Hou, Jie Yao, Liping Zhang. [doi]
- DC4: Reconstructing Data-Credit-Coupled Congestion Control for Data CentersShan Huang, Dezun Dong, Lingbin Zeng, Zejia Zhou, Yukun Zhou, Xiangke Liao. [doi]
- SMEGA2: Distributed Asynchronous Deep Neural Network Training With a Single Momentum BufferRefael Cohen, Ido Hakimi, Assaf Schuster. [doi]
- FLOPs as a Discriminant for Dense Linear Algebra AlgorithmsFrancisco López, Lars Karlsson, Paolo Bientinesi. [doi]
- Semi-Online Multi-Machine with Restart Scheduling for Integrated Edge and Cloud Computing SystemsLiming Ge, Zizhao Wang, Wei Bao, Dong Yuan, Nguyen Hoang Tran, Bing Bing Zhou, Albert Y. Zomaya. [doi]
- SHE: A Generic Framework for Data Stream Mining over Sliding WindowsYuhan Wu, Zhuochen Fan, Qilong Shi, Yixin Zhang, Tong Yang, Cheng Chen, Zheng Zhong, Junnan Li, Ariel Shtul, Yaofeng Tu. [doi]
- Tensor-Accelerated Fourth-Order Epistasis Detection on GPUsRicardo Nobre, Aleksandar Ilic, Sergio Santander-Jiménez, Leonel Sousa. [doi]
- Performance Modeling for Short-Term Cache AllocationChristopher Stewart, Nathaniel Morris, Lydia Y. Chen, Robert Birke. [doi]
- Exploiting Parallelism of Disk Failure Recovery via Partial Stripe Repair for an Erasure-Coded High-Density Storage ServerLin Wang, Yuchong Hu, Qian Du, Dan Feng 0001, Ray Wu, Ingo He, Kevin Zhang. [doi]
- Mlog: Multi-log Write Buffer upon Ultra-fast SSD RAIDShucheng Wang, Qiang Cao 0001, Ziyi Lu, Jie Yao. [doi]
- DSSA: Dual-Side Sparse Systolic Array Architecture for Accelerating Convolutional Neural Network TrainingZhengbo Chen, Qi Yu, Fang Zheng, Feng Guo, Zuoning Chen. [doi]
- Multi Resource Scheduling with Task Cloning in Heterogeneous ClustersHuanle Xu, Yang Liu, Wing Cheong Lau. [doi]
- LDPP: A Learned Directory Placement Policy in Distributed File SystemsYuanzhang Wang, Fengkui Yang, Ji Zhang, Chunhua Li, Ke Zhou, Chong Liu, Zhuo Cheng, Wei Fang, Jinhu Liu. [doi]
- Acuerdo: Fast Atomic Broadcast over RDMAJoseph Izraelevitz, Gaukas Wang, Rhett Hanscom, Kayli Silvers, Tamara Silbergleit Lehman, Gregory V. Chockler, Alexey Gotsman. [doi]
- Exploiting CXL-based Memory for Distributed Deep LearningMoiz Arif, Kevin Assogba, M. Mustafa Rafique, Sudharshan Vazhkudai. [doi]
- Characterizing and Optimizing Transformer Inference on ARM Many-core ProcessorJiazhi Jiang, Jiangsu Du, Dan Huang, Dongsheng Li, Jiang Zheng, Yutong Lu. [doi]
- Counting Induced 6-Cycles in Bipartite GraphsJason Niu, Jaroslaw Zola, Ahmet Erdem Sariyüce. [doi]
- BWA-MEM-SCALE: Accelerating Genome Sequence Mapping on Commodity ServersChangdae Kim, Kwangwon Koh, Taehoon Kim, Daegyu Han, Jiwon Seo. [doi]
- Mentha: Enabling Sparse-Packing Computation on Systolic ArraysMinjin Tang, Mei Wen, Yasong Cao, Junzhong Shen, Jianchao Yang, Jiawei Fei, Yang Guo, Sheng Liu 0001. [doi]
- DRAM Cache Management with Request Granularity for NAND-based SSDsHaodong Lin, Zhibing Sha, Jun Li 0062, Zhigang Cai, Balazs Gerofi, Yuanquan Shi, Jianwei Liao. [doi]
- ElastiSim: A Batch-System Simulator for Malleable WorkloadsTaylan Özden, Tim Beringer, Arya Mazaheri, Hamid Mohammadi Fard, Felix Wolf 0001. [doi]
- Parallel Network Slicing for Multi-SP ServicesRongxin Han, Dezhi Chen, Song Guo, Xiaoyuan Fu, Jingyu Wang 0001, Qi Qi 0001, Jianxin Liao. [doi]
- BSCache: A Brisk Semantic Caching Scheme for Cloud-based Performance Monitoring Timeseries SystemsKai Zhang, Zhiqi Wang, Zili Shao. [doi]
- NNLQP: A Multi-Platform Neural Network Latency Query and Prediction System with An Evolving DatabaseLiang Liu, Mingzhu Shen, Ruihao Gong, Fengwei Yu, Hailong Yang. [doi]
- From RTL to CUDA: A GPU Acceleration Flow for RTL Simulation with Batch StimulusDian-Lun Lin, Haoxing Ren, Yanqing Zhang, Brucek Khailany, Tsung-Wei Huang. [doi]
- Eco-FL: Adaptive Federated Learning with Efficient Edge Collaborative Pipeline TrainingShengyuan Ye, Liekang Zeng, Qiong Wu, Ke Luo, Qingze Fang, Xu Chen. [doi]
- DeepCAT: A Cost-Efficient Online Configuration Auto-Tuning Approach for Big Data FrameworksHui Dou, Yilun Wang, Yiwen Zhang, Pengfei Chen. [doi]
- BULB: Lightweight and Automated Load Balancing for Fast Datacenter NetworksYuan Liu, Wenxin Li 0001, Wenyu Qu, Heng Qi. [doi]
- Spread: Decentralized Model Aggregation for Scalable Federated LearningChuang Hu, Huang Huang Liang, Xiao Ming Han, Bo An Liu, Da Zhao Cheng, Dan Wang 0002. [doi]
- Accelerating Random Forest Classification on GPU and FPGAMilan Shah, Reece Neff, Hancheng Wu, Marco Minutoli, Antonino Tumeo, Michela Becchi. [doi]
- Online Resource Optimization for Elastic Stream Processing with Regret GuaranteeYang Liu, Huanle Xu, Wing Cheong Lau. [doi]
- Dynamic Strategies for High Performance Training of Knowledge Graph EmbeddingsAnwesh Panda, Sathish Vadhiyar. [doi]
- Repair-Optimal Data Placement for Locally Repairable Codes with Optimal Minimum Hamming DistanceShuang Ma, Si Wu 0003, Cheng Li 0001, Yinlong Xu. [doi]
- SPAMeR: Speculative Push for Anticipated Message Requests in Multi-Core SystemsQinzhe Wu, Ashen Ekanayake, Ruihao Li 0002, Jonathan Beard, Lizy Kurian John. [doi]
- NCC: Neighbor-aware Congestion Control based on Reinforcement Learning for Datacenter NetworksHaoyu Wang 0003, Kevin Zheng, Charles Reiss, Haiying Shen. [doi]
- ParallelDualSPHysics: supporting efficient parallel fluid simulations through MPI-enabled SPH methodSifan Long, Xiaowei Guo, Xiaokang Fan, Chao Li, Kelvin Wong, Ran Zhao, Yi Liu, Sen Zhang, Canqun Yang. [doi]
- GraphSD: A State and Dependency aware Out-of-Core Graph Processing SystemXianghao Xu, Hong Jiang 0001, Fang Wang 0001, Yongli Cheng, Peng Fang. [doi]
- ADSTS: Automatic Distributed Storage Tuning System Using Deep Reinforcement LearningKai Lu, Guokuan Li, Jiguang Wan, Ruixiang Ma, Wei Zhao. [doi]
- Scheduling Fork-Join Task Graphs with Communication Delays and Equal Processing TimesHuijun Wang, Oliver Sinnen. [doi]
- Atos: A Task-Parallel GPU Scheduler for Graph AnalyticsYuxin Chen, Benjamin Brock, Serban D. Porumbescu, Aydin Buluç, Katherine A. Yelick, John D. Owens. [doi]