Abstract is missing.
- Parallel Iterative Mistake Minimization (IMM) clustering algorithm for shared-memory systemsWojciech Kwedlo. 1-10 [doi]
- Fast Leiden Algorithm for Community Detection in Shared Memory SettingSubhajit Sahu, Kishore Kothapalli, Dip Sankar Banerjee. 11-20 [doi]
- Parallel Optimization for Accelerating the Generation of Correctly Rounded Elementary FunctionsXianglin Wang, Xin Yi, Hengbiao Yu, Chun Huang, Lin Peng. 21-31 [doi]
- Optimizing a Super-Fast Eigensolver for Hierarchically Semiseparable MatricesAbhishek V. N. Taraka Josyula, Pritesh Verma, Amar Gaonkar, Amlan Barua, Nikhil Hegde. 32-41 [doi]
- Online Non-preemptive Multi-Resource Scheduling for Weighted Completion Time on Multiple MachinesDonney Fan, Ben Liang. 42-51 [doi]
- FP16 Acceleration in Structured Multigrid Preconditioner for Real-World ApplicationsYi Zong, Peinan Yu, Haopeng Huang, Wei Xue. 52-62 [doi]
- DPC: DPU-accelerated High-Performance File System ClientKan Zhong, Zhiwang Yu, Qiao Li 0001, Xianqiang Luo, Linbo Long, Yujuan Tan, Ao Ren, Duo Liu. 63-72 [doi]
- Co-Design of Convolutional Algorithms and Long Vector RISC-V Processors for Efficient CNN Model ServingSonia Rani Gupta, Nikela Papadopoulou, Jing Chen, Miquel Pericàs. 73-83 [doi]
- The Case for Co-Designing Model Architectures with HardwareQuentin Anthony, Jacob Hatef, Deepak Narayanan, Stella Biderman, Stas Bekman, Junqi Yin, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda 0001. 84-96 [doi]
- Improving efficiency of Monte Carlo method via code intrinsic frameworkQifeng Pan, Ralf Schneider. 97-106 [doi]
- Accelerated Constrained Sparse Tensor Factorization on Massively Parallel ArchitecturesYongseok Soh, Ramakrishnan Kannan, Piyush Sao, Jee W. Choi. 107-116 [doi]
- Sparsity-Aware Communication for Distributed Graph Neural Network TrainingUjjaini Mukhopadhyay, Alok Tripathy, Oguz Selvitopi, Katherine A. Yelick, Aydin Buluç. 117-126 [doi]
- FNCC: Fast Notification Congestion Control in Data Center NetworksJing Xu, Zhan Wang, Fan Yang, Ning Kang 0007, Zhenlong Ma, Guojun Yuan, Guangming Tan, Ninghui Sun. 127-137 [doi]
- Distributed Minimax Fair Optimization over Hierarchical NetworksWen Xu, Juncheng Wang 0001, Ben Liang 0001, Gary Boudreau, Hamza Sokun. 138-147 [doi]
- Sparse Gradient Communication with AlltoAll for Accelerating Distributed Deep LearningJing Peng, Zihan Li, Shaohuai Shi, Bo Li 0001. 148-157 [doi]
- SuperCSR: A Space-Time-Efficient CSR Representation for Large-scale Graph Applications on SupercomputersXinbiao Gan, Tiejun Li, Qiang Zhang, Bo Yang 0023, Xinhai Chen, Jie Liu 0002. 158-167 [doi]
- Dissecting Convolutional Neural Networks for Runtime and Scalability PredictionTim Beringer, Jakob Stock, Arya Mazaheri, Felix Wolf 0001. 168-178 [doi]
- SyncMalloc: A Synchronized Host-Device Co-Management System for GPU Dynamic Memory Allocation across All ScalesJiajian Zhang, Fangyu Wu 0001, Hai Jiang 0003, Guangliang Cheng, Genlang Chen, Qiufeng Wang. 179-188 [doi]
- Selective Memory Compression for GPU Memory Oversubscription ManagementAbdun Nihaal, Madhu Mutyam. 189-198 [doi]
- Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace HopperGabin Schieffer, Jacob Wahlgren, Jie Ren 0015, Jennifer Faj, Ivy Peng. 199-209 [doi]
- In-Situ Binary Segmentation of 3D time-dependent Flows into Laminar and Turbulent RegionsJiahui Liu, Tobias Edwards, Kristina Durovic, Philipp Schlatter, Tino Weinkauf. 210-219 [doi]
- Enabling Performance Observability for Heterogeneous HPC Workflows with SOMADewi Yokelson, Mikhail Titov, Srinivasan Ramesh, Ozgur O. Kilic, Matteo Turilli, Shantenu Jha, Allen D. Malony. 220-230 [doi]
- HStream: A hierarchical data streaming engine for high-throughput scientific applicationsJaime Cernuda, Jie Ye, Anthony Kougkas, Xian-He Sun. 231-240 [doi]
- High-Performance 3D convolution on the Latest Generation Sunway ProcessorJialin Li, Zhichen Feng, Yaqian Gao, Shaobo Tian, Haoyuan Zhang, Huang Ye, Jian Zhang 0070. 241-251 [doi]
- Kanva: A Lock-free Learned Search Data StructureGaurav Bhardwaj, Bapi Chatterjee, Abhinav Sharma, Sathya Peri, Siddharth Nayak. 252-261 [doi]
- BoostN: Optimizing Imbalanced Neighborhood Communication on Homogeneous Many-Core SystemHaopeng Huang, Yuyang Jin, Wei Xue. 262-272 [doi]
- Extending Segment Tree for Polygon Clipping and Parallelizing using OpenMP and OpenACC DirectivesBuddhi Ashan Mallika Kankanamalage, Satish Puri, Sushil K. Prasad. 273-283 [doi]
- Exploring Scalability in C++ Parallel STL ImplementationsRuben Laso, Diego Krupitza, Sascha Hunold. 284-293 [doi]
- OP-PIC - an Unstructured-Mesh Particle-in-Cell DSL for Developing Nuclear Fusion SimulationsZaman Lantra, Steven A. Wright 0001, Gihan R. Mudalige. 294-304 [doi]
- Mapping Large Memory-constrained Workflows onto Heterogeneous Platforms✱Svetlana Kulagina, Henning Meyerhenke, Anne Benoit. 305-316 [doi]
- The Blind and the Elephant: A Preference-aware Edge Video Analytics Scheduler for Maximizing System BenefitLiang Zhang, Hongzi Zhu, Yunzhe Li, Jiangang Shen, Minyi Guo. 317-326 [doi]
- Diminishing cold starts in serverless computing with approximation algorithmsTomasz Kanas, Krzysztof Rzadca. 327-336 [doi]
- Achieving Efficient Scheduling based on Accurate Measurement of Small Flows in Data CenterJiawei Huang 0001, Qile Wang, Zhaoyi Li, Yijun Li, Zihao Chen, Sitan Li, Jing Shao, Jingling Liu, Min Zhan, Jianxin Wang 0001. 337-346 [doi]
- Thawbringer: An Orchestrator to Mitigate Cascading Cold Starts of Serverless Function ChainsHuadong Li, Hui Liu, Aoqi Chen, Xirui Ma, Junzhao Du. 347-356 [doi]
- Online Scheduling and Pricing for Multi-LoRA Fine-Tuning TasksYing Zheng, Lei Jiao, Han Yang, Lulu Chen, Ying Liu, Yuxiao Wang, Yuedong Xu, Xin Wang, Zongpeng Li. 357-366 [doi]
- Arlo: Serving Transformer-based Language Models with Dynamic Input LengthsXin Tan, Jiamin Li, Yitao Yang, Jingzong Li, Hong Xu 0001. 367-376 [doi]
- Large-scale Phase-Field Simulations for Solid-Solid Phase Transformations involving Elastic EnergyYaqian Gao, Jian Zhang, Huang Ye, Xuebin Chi. 377-387 [doi]
- FlatDD: A High-Performance Quantum Circuit Simulator using Decision Diagram and Flat ArrayShui Jiang, Rongliang Fu, Lukas Burgholzer, Robert Wille, Tsung-Yi Ho, Tsung-Wei Huang. 388-399 [doi]
- Multi-level Load Balancing Strategies for Massively Parallel Smoothed Particle Hydrodynamics SimulationYi Zhang, Ziyu Zhang, Yang Zhao, Junshi Chen, Hong An, Zhanming Wang, Longkui Chen. 400-410 [doi]
- A Motion Trace Decomposition-based overset grid method for parallel CFD simulations with moving boundariesRan Zhao, Chao Li, Xiaowei Guo, Sen Zhang, Xi Yang, Tao Tang, Canqun Yang. 411-420 [doi]
- NetSmith: An Optimization Framework for Machine-Discovered Network TopologiesConor James Green, Mithuna Thottethodi. 421-432 [doi]
- A Distributed Framework for Subgraph Isomorphism Leveraging CPU and GPU Heterogeneous ComputingChen Chen 0016, Li Shen 0007, Yingwen Chen 0001. 433-442 [doi]
- AutoPipe: Automatic Configuration of Pipeline Parallelism in Shared GPU ClusterJinbin Hu, Ying Liu, Hao Wang, Jin Wang. 443-452 [doi]
- HyperDB: a Novel Key Value Store for Reducing Background Traffic in Heterogeneous SSD StorageRuisong Zhou, Yuzhan Zhang, Chunhua Li, Ke Zhou, Peng Wang, Gong Zhang, Ji Zhang, Guangyu Zhang. 453-463 [doi]
- ChronusFed: Reinforcement-Based Adaptive Partial Training for Heterogeneous Federated LearningFuyuan Xia, Chenhao Ying, David S. L. Wei, Wei Chen, Weiting Zhang, Haiming Jin, Yuan Luo 0003. 464-473 [doi]
- FedClust: Tackling Data Heterogeneity in Federated Learning through Weight-Driven Client ClusteringMd Sirajul Islam, Simin Javaherian, Fei Xu, Xu Yuan 0001, Li Chen 0019, Nian-Feng Tzeng. 474-483 [doi]
- HASFL: Harnessing Heterogeneous Models Across Diverse Devices for Enhanced Federated LearningJiangshan Hao, Fang Dong 0001, Bingheng Cen, Shucun Fu, Ruiting Zhou, Ding Ding. 484-493 [doi]
- FedCA: Efficient Federated Learning with Client AutonomyNa Lv, Zhi Shen, Chen Chen 0067, Zhifeng Jiang, Jiayi Zhang, Quan Chen 0002, Minyi Guo. 494-503 [doi]
- MIGER: Integrating Multi-Instance GPU and Multi-Process Service for Deep Learning ClustersBowen Zhang, Shuxin Li, Zhuozhao Li. 504-513 [doi]
- Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC EnvironmentFei Yang, Shuang Peng, Ning Sun, Fangyu Wang, Yuanyuan Wang, Fu Wu, Jiezhong Qiu, Aimin Pan. 514-523 [doi]
- SPHINX: Search Space-Pruning Heterogeneous Task Scheduling for Deep Neural NetworksBowen Yuchi, Heng Shi, Guoqing Bao. 524-533 [doi]
- Enhancing Heterogeneous Computing Through OpenMP and GPU GraphChenle Yu, Sara Royuela, Eduardo Quiñones. 534-543 [doi]
- CR2: Community-aware Compressed Regular Representation for Graph Processing on a GPUShinnung Jeong, Sungjun Cho, Yongwoo Lee 0001, Hyunjun Park, Seonyeong Heo, Gwangsun Kim, Youngsok Kim, Hanjun Kim 0001. 544-554 [doi]
- SpeedCore: Space-efficient and Dependency-aware GPU Parallel Framework for Core DecompositionChen Zhao, Ting Yu 0004, Zhigao Zheng, Yuanyuan Zhu, Song Jin, Bo Du 0001, Dacheng Tao. 555-564 [doi]
- GSAP: A GPU-Accelerated Stochastic Graph PartitionerChih-Chun Chang, Boyang Zhang, Tsung-Wei Huang. 565-575 [doi]
- BrickDL: Graph-Level Optimizations for DNNs with Fine-Grained Data Blocking on GPUsMahesh Lakshminarasimhan, Mary W. Hall, Samuel Williams 0001, Oscar Antepara. 576-586 [doi]
- GPU Algorithms for Fastest Path Problem in Temporal GraphsMithinti Srikanth, Prashant Singh, G. Ramakrishna. 587-596 [doi]
- Yggdrasil: Reducing Network I/O Tax with (CXL-Based) Distributed Shared MemoryWenda Tang, Ying Han, Tianxiang Ai, Guanghui Li, Bin Yu, Xin Yang. 597-606 [doi]
- DiStore: A Fully Memory Disaggregation Friendly Key-Value Store with Improved Tail Latency and Space EfficiencyZiwei Xiong, Dejun Jiang 0001, Jin Xiong. 607-617 [doi]
- zQoS: Unleashing full performance capabilities of NVMe SSDs while enforcing SLOs in distributed storage systemsLiuying Ma, Zhenqing Liu, Jin Xiong, Yue Wu, Renhai Chen, Xi Peng 0006, Ying Zhang, Gong Zhang 0001, Dejun Jiang 0001. 618-628 [doi]
- Scratchpad Memory Management for Deep Learning AcceleratorsStavroula Zouzoula, Mohammad Ali Maleki, Muhammad Waqar Azhar, Pedro Trancoso. 629-639 [doi]
- CAMLB-SpMV: An Efficient Cache-Aware Memory Load-Balancing SpMV on CPUJihu Guo, Rui Xia, Jie Liu, Xiaoxiong Zhu, Xiang Zhang. 640-649 [doi]
- GNNDrive: Reducing Memory Contention and I/O Congestion for Disk-based GNN TrainingQisheng Jiang, Lei Jia, Chundong Wang 0001. 650-659 [doi]
- GMM: An Efficient GPU Memory Management-based Model Serving System for Multiple DNN Inference ModelsXinYu Piao, Jong-Kook Kim. 660-668 [doi]
- A Hybrid Machine Learning Method for Cross-Platform Performance Prediction of Parallel ApplicationsKaveh Mahdavi. 669-678 [doi]
- Optimizing Stencil Computation on Multi-core DSPsFugeng Zhu, Xinxin Qi, Peng Zhang 0061, Jianbin Fang, Tao Tang 0001, Yonggang Che, Kainan Yu, Jing Xie, Chun Huang, Jie Ren 0007. 679-690 [doi]
- Evaluating and optimising compiler code generation for NVIDIA GraceRicardo Jesus, Michèle Weiland. 691-700 [doi]
- DeInfer: A GPU resource allocation algorithm with spatial sharing for near-deterministic inferring tasksYingwen Chen 0001, Wenxin Li, Huan Zhou 0006, Xiangrui Yang, Yanfei Yin. 701-711 [doi]
- PheCon: Fine-Grained VM Consolidation with Nimble Resource Defragmentation in Public Cloud PlatformsJiazhen Zhu, Wenda Tang, Xianglong Meng, Nan Gong, Tianxiang Ai, Guanghui Li, Bin Yu, Xin Yang. 712-721 [doi]
- PREACT: Predictive Resource Allocation for Bursty Workloads in a Co-located Data CenterDingyu Yang, Ziyang Xiao, Dongxiang Zhang, Shuhao Zhang 0001, Jian Cao 0001, Gang Chen 0001. 722-731 [doi]
- FlexSP: (1 + β)-Choice based Flexible Stream Partitioning for Stateful OperatorsSiyuan Chen, Decheng Zuo, Zhan Zhang. 732-741 [doi]
- Parallel Task Scheduling in Autonomous Robotic Systems: An Event-Driven Multimodal Prediction ApproachWen Gao, Zhiwen Yu, Hui Xiong, Bin Guo, Liang Wang, Yuan Yao. 742-751 [doi]
- IMI: In-memory Multi-job Inference Acceleration for Large Language ModelsBin Gao, Zhehui Wang, Zhuomin He, Tao Luo 0014, Weng-Fai Wong, Zhi Zhou 0006. 752-761 [doi]
- Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-tuningBei Ouyang, Shengyuan Ye, Liekang Zeng, Tianyi Qian, Jingyi Li, Xu Chen 0004. 762-771 [doi]
- RIA: Return on Investment Auto-scaler for Serverless Edge FunctionsHuadong Li, Hui Liu, Aoqi Chen, Xirui Ma, Qiaoqiao Liu, Junzhao Du. 772-781 [doi]
- Nebula: An Edge-Cloud Collaborative Learning Framework for Dynamic Edge EnvironmentsYan Zhuang, Zhenzhe Zheng, Yunfeng Shao 0001, Bingshuai Li, Fan Wu 0006, Guihai Chen. 782-791 [doi]
- Murmuration: On-the-fly DNN Adaptation for SLO-Aware Distributed Inference in Dynamic Edge EnvironmentsJieyu Lin, Minghao Li, Sai Qian Zhang, Alberto Leon-Garcia. 792-801 [doi]
- Cache Line Pinning for Mitigating Row Hammer AttackPraseetha M, Madhu Mutyam, Venkata Kalyan Tavva. 802-811 [doi]
- Viper: A High-Performance I/O Framework for Transparently Updating, Storing, and Transferring Deep Neural Network ModelsJie Ye, Jaime Cernuda, Neeraj Rajesh, Keith Bateman, Orcun Yildiz, Tom Peterka, Arnur Nigmetov, Dmitriy Morozov, Xian-He Sun, Anthony Kougkas, Bogdan Nicolae. 812-821 [doi]
- PRoof: A Comprehensive Hierarchical Profiling Framework for Deep Neural Networks with Roofline AnalysisSiyu Wu, Hailong Yang, Xin You, Ruihao Gong, Yi Liu, Zhongzhi Luan, Depei Qian. 822-832 [doi]
- RMASanitizer: Generalized Runtime Detection of Data Races in Remote Memory Access ApplicationsSimon Schwitanski, Yussur Mustafa Oraji, Cornelius Pätzold, Joachim Jenke, Felix Tomski, Matthias S. Müller. 833-844 [doi]
- Significantly Improving Fixed-Ratio Compression Framework for Resource-limited ApplicationsTri Nguyen, Md Hasanur Rahman 0001, Sheng Di, Michela Becchi. 845-855 [doi]
- Massively Parallel Inverse Block-sorting Transforms for bzip2 Decompression on GPUsAndré Weißenberger, Bertil Schmidt. 856-865 [doi]
- Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated LearningZichen Tang, Junlin Huang, Rudan Yan, Yuxin Wang, Zhenheng Tang, Shaohuai Shi, Amelie Chi Zhou, Xiaowen Chu 0001. 866-875 [doi]
- RoDMap: A Reserve-on-Demand Mapper for Spatially-Configured Coarse-Grained Reconfigurable ArraysKyle Zhao Bin Chen, Tarek S. Abdelrahman, Reza Azimi, Tomasz S. Czajkowski, Maziar Goudarzi. 876-886 [doi]
- Hardware Acceleration of Minimap2 Genomic Sequence Alignment AlgorithmJie Cheng, Lifu Hu, Wei Xu, Hanhua Chen, Tian Xia. 887-897 [doi]
- LpaqHP: A High-Performance FPGA Accelerator for LPAQ CompressionWeilin Zhu, Wei Tong 0001, Hujun Ge, Zuoxian Zhang, Mengran Zhang, Wen Zhou. 898-907 [doi]
- PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPUPiyush Sao, Andrey Prokopenko, Damien Lebrun-Grandié. 908-918 [doi]
- High-Performance Sorting-Based K-mer Counting in Distributed Memory with Flexible Hybrid ParallelismYifan Li, Giulia Guidi. 919-928 [doi]
- Revisiting Learned Index with Byte-addressable Persistent StorageRui Zhang, Yukai Huang, Sicheng Liang, Shangyi Sun, Shaonan Ma, Chengying Huan, Lulu Chen, Zhihui Lu 0002, Yang Xu, Ming Yan, Jie Wu 0003. 929-938 [doi]
- TESLA: Thermally Safe, Load-Aware, and Energy-Efficient Cooling Control System for Data CentersHanfei Geng, Yi Sun, Yuanzhe Li, Jichao Leng, Xiangyu Zhu, Xianyuan Zhan, Yuanchun Li, Feng Zhao, Yunxin Liu. 939-949 [doi]
- Rethinking Low-Carbon Edge Computing System Design with Renewable Energy SharingHanlong Liao, Guoming Tang, Deke Guo, Yi Wang 0004, Ruide Cao. 950-960 [doi]
- Scheduling Machine Learning Compressible Inference Tasks with Limited Energy BudgetTiago Da Silva Barros, Davide Ferré, Frédéric Giroire, Ramon Aparicio-Pardo, Stephane Perennes. 961-970 [doi]
- Gradient Free Personalized Federated LearningHaoyu Chen, Yuxin Zhang, Jin Zhao 0001, Xin Wang, Yuedong Xu. 971-980 [doi]
- Federated Edge Learning with Blurred or Pseudo Data SharingYinlong Li, Hao Zhang 0016, Siyao Cheng, Jie Liu 0001. 981-990 [doi]
- Rethinking Personalized Federated Learning from Knowledge PerspectiveDezhong Yao 0002, Ziquan Zhu, Tongtong Liu, Zhiqiang Xu 0003, Hai Jin 0001. 991-1000 [doi]
- AdCoalescer: An Adaptive Coalescer to Reduce the Inter-Module Traffic in MCM-GPUsXu Zhang, Guangda Zhang, Lu Wang 0019, Shiqing Zhang, Xia Zhao 0004. 1001-1011 [doi]
- VitBit: Enhancing Embedded GPU Performance for AI Workloads through Register Operand PackingJaebeom Jeon, Minseong Gil, Junsu Kim, Jaeyong Park, Gunjae Koo, Myung Kuk Yoon, Yunho Oh. 1012-1021 [doi]
- FreeStencil: A Fine-Grained Solver Compiler with Graph and Kernel Optimizations on Structured Meshes for Modern GPUsQianchao Zhu. 1022-1031 [doi]
- CIM-KF: Efficient Computing-in-memory Circuits for Full-Process Execution of Kalman Filter AlgorithmPingdan Xiao, Qinghui Hong, Sichun Du, Jiliang Zhang 0002. 1032-1041 [doi]
- ReDy: A Novel ReRAM-centric Dynamic Quantization Approach for Energy-efficient CNNsMohammad Sabri Abrebekoh, Marc Riera Villanueva, Antonio González 0001. 1042-1051 [doi]
- AUTOHET: An Automated Heterogeneous ReRAM-Based Accelerator for DNN InferenceTong Wu, Shuibing He, Jianxin Zhu, Weijian Chen 0002, Siling Yang, Ping Chen, Yanlong Yin, Xuechen Zhang 0001, Xian-He Sun, Gang Chen 0001. 1052-1061 [doi]
- Parallelization of the Banded Needleman & Wunsch Algorithm on UPMEM PiM Architecture for Long DNA Sequence AlignmentMeven Mognol, Dominique Lavenier, Julien Legriel. 1062-1071 [doi]
- Im2col-Winograd: An Efficient and Flexible Fused-Winograd Convolution for NHWC Format on GPUsZhiyi Zhang, Pengfei Zhang, Zhuopin Xu, Bingjie Yan, Qi Wang. 1072-1081 [doi]
- Improving Performance on Replica-Exchange Molecular Dynamics Simulations by Optimizing GPU Core UtilizationTaisuke Boku, Masatake Sugita, Ryohei Kobayashi, Shinnosuke Furuya, Takuya Fujie, Masahito Ohue, Yutaka Akiyama. 1082-1091 [doi]
- High-Performance, Accurate Large-Scale Quantum Chemistry Calculations on GPU Supercomputers using Coulomb-Perturbed FragmentationFazeleh S. Kazemian, Jorge L. Galvez Vallejo, Giuseppe M. J. Barca. 1092-1102 [doi]
- PASCI : A Scalable Framework for Heterogeneous Parallel Calculation of Dynamical Electron CorrelationRunfeng Jin, Wenhao Liang, Haoyuan Zhang, Yinxuan Song, Zhen Luo, Haibo Ma, Yingjin Ma, Zhong Jin. 1103-1113 [doi]
- TeMCO: Tensor Memory Compiler Optimization across Tensor Decompositions in Deep Learning InferenceSeungbin Song, Ju Min Lee, Haeeun Jeong, Hyunho Kwon, Shinnung Jeong, Jaeho Lee 0005, Hanjun Kim 0001. 1114-1123 [doi]
- Jigsaw: Accelerating SpMM with Vector Sparsity on Sparse Tensor CoreKaige Zhang, Xiaoyan Liu, Hailong Yang, Tianyu Feng, Xinyu Yang, Yi Liu, Zhongzhi Luan, Depei Qian. 1124-1134 [doi]
- Bitmap-Based Sparse Matrix-Vector Multiplication with Tensor CoresYuang Chen, Jeffrey Xu Yu. 1135-1144 [doi]
- Optimizing SpMV on Heterogeneous Multi-Core DSPs through Improved Locality and VectorizationDeshun Bi, Shengguo Li, Dezun Dong, Peng Zhang 0061, Jianbin Fang. 1145-1155 [doi]
- DB-SpGEMM: A Massively Distributed Block-Sparse Matrix-Matrix Multiplication for Linear-Scaling DFT CalculationsZhong Zheng, Junshi Chen, Yang Zhao, Longsheng Song, Xinming Qin, Hong An. 1156-1165 [doi]
- SaSpGEMM: Sorting-Avoiding Sparse General Matrix-Matrix Multiplication on Multi-Core ProcessorsChuhe Hong, Qinglin Wang, Runzhang Mao, Yuechao Liang, Rui Xia, Jie Liu. 1166-1175 [doi]
- Detailed Analysis and Optimization of Irregular-Shaped Matrix Multiplication on Multi-Core DSPsHaotian Mo, Qinglin Wang, Linyu Liao, Biao Li, Lihua Chi, Jie Liu. 1176-1186 [doi]
- BandSlim: A Novel Bandwidth and Space-Efficient KV-SSD with an Escape-from-Block ApproachJunhyeok Park, Chang Gyu Lee, Soon Hwang, Soonyeal Yang, Jungki Noh, Woosuk Chung, JungHee Lee, Youngjae Kim 0001. 1187-1196 [doi]
- Designing Non-uniform Locally Repairable Codes for Wide Stripes under Skewed File AccessesGuantian Lin, Si Wu 0003, Cheng Li 0001, Yinlong Xu. 1197-1206 [doi]
- HMT: A Hybrid Mitigating and Transferring Approach on I/O Throughput Degradation for Erasure Coded Storage SystemsPiao Hu, Huangzhen Xue, Chentao Wu, Jie Li 0002, Minyi Guo. 1207-1216 [doi]
- Hi-ZNS: High Space Efficiency and Zero-Copy LSM-Tree Based Stores on ZNS SSDsRenping Liu, Junhua Chen, Peng Chen, Linbo Long, Anping Xiong, Duo Liu. 1217-1226 [doi]
- Achieving High Efficiency for Datacenter Multicast using Skewed Bloom FilterJiawei Huang 0001, Zihao Chen, Yiting Wang, Hui Li, Zhaoyi Li, Qile Wang, Sitan Li, Zhidong He, Wanchun Jiang. 1227-1236 [doi]
- SIndex: An SSD-based Large-scale Indexing with Deterministic Latency for Cloud Block StorageShucheng Wang, Kaiye Zhou, Zhandong Guo, Qiang Cao, Jun Xu, Jie Yao. 1237-1246 [doi]
- Coupling Congestion Control and Flow Pausing in Data Center NetworkJiawei Huang 0001, Shengwen Zhou, Zhaoyi Li, Yijun Li, Zihao Chen, Xiaojun Zhu, Jing Shao, Sitan Li, Wanchun Jiang, Jianxin Wang 0001, Ping Zhong 0002. 1247-1256 [doi]