Abstract is missing.
- Wave-PIM: Accelerating Wave Simulation Using Processing-in-MemoryBagus Hanindhito, Ruihao Li 0002, Dimitrios Gourounas, Arash Fathi, Karan Govil, Dimitar Trenev, Andreas Gerstlauer, Lizy Kurian John. [doi]
- Processor-Aware Cache-Oblivious Algorithms✱Yuan Tang, Weiguo Gao. [doi]
- Multi-Resource List Scheduling of Moldable Parallel Jobs under Precedence ConstraintsLucas Perotin, Hongyang Sun, Padma Raghavan. [doi]
- Parallel Multi-split Extendible Hashing for Persistent MemoryJing Hu, Jianxi Chen, Yifeng Zhu, Qing Yang, Zhouxuan Peng, Ya Yu. [doi]
- Optimizing Flow Completion Time via Adaptive Buffer Management in Data Center NetworksSen Liu, Xiang Lin, Zehua Guo 0001, Yi Wang 0004, Mohamed Adel Serhani, Yang Xu 0010. [doi]
- ADA: An Application-Conscious Data Acquirer for Visual Molecular DynamicsHanpei Wu, Tongliang Deng, Yanliang Zou, Shu Yin, Si Chen, Tao Xie. [doi]
- Ascetic: Enhancing Cross-Iterations Data Efficiency in Out-of-Memory Graph Processing on GPUsRuiqi Tang, Ziyi Zhao, Kailun Wang, Xiaoli Gong, Jin Zhang, Wenwen Wang, Pen-Chung Yew. [doi]
- Fast Reconstruction for Large Disk Enclosures Based on RAID2.0Qiliang Li, Min Lyu, Liangliang Xu, Yinlong Xu, Wei Wang. [doi]
- Dubhe: Towards Data Unbiasedness with Homomorphic Encryption in Federated Learning Client SelectionShulai Zhang, Zirui Li, Quan Chen 0002, Wenli Zheng, Jingwen Leng, Minyi Guo. [doi]
- Accelerating DBSCAN Algorithm with AI Chips for Large DatasetsZhuoran Ji, Cho-Li Wang. [doi]
- MetaCache-GPU: Ultra-Fast Metagenomic ClassificationRobin Kobus, André Müller, Daniel Jünger, Christian Hundt 0002, Bertil Schmidt. [doi]
- gem5 + rtl: A Framework to Enable RTL Models Inside a Full-System SimulatorGuillem López-Paradís, Adrià Armejach, Miquel Moretó. [doi]
- A Universal Construction to implement Concurrent Data Structure for NUMA-muticoreZhengming Yi, Yiping Yao, Kai Chen 0020. [doi]
- Distributed Game-Theoretical Route Navigation for Vehicular CrowdsensingEn Wang, Dongming Luan, Yongjian Yang, Zihe Wang, Pengmin Dong, Dawei Li 0002, Wenbin Liu, Jie Wu. [doi]
- Efficient GPU-Implementation for Integer Sorting Based on Histogram and Prefix-SumsSeiya Kozakai, Noriyuki Fujimoto, Koichi Wada. [doi]
- Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC ArchitecturesChenhao Xie 0001, Jieyang Chen, Jesun Firoz, Jiajia Li, Shuaiwen Leon Song, Kevin J. Barker, Mark Raugas, Ang Li. [doi]
- Accelerating Sequence-to-Graph Alignment on Heterogeneous ProcessorsZonghao Feng, Qiong Luo 0001. [doi]
- Matryoshka: A Coalesced Delta Sequence PrefetcherShizhi Jiang, Yiwei Ci, Qiusong Yang, Mingshu Li. [doi]
- BGPQ: A Heap-Based Priority Queue Design for GPUsYan Hao Chen, Fei Hua, Yuwei Jin, Eddy Z. Zhang. [doi]
- Prophet: Speeding up Distributed DNN Training with Predictable Communication SchedulingZhenwei Zhang, Qiang Qi, Ruitao Shang, Li Chen, Fei Xu. [doi]
- Recursion Brings Speedup to Out-of-Core TensorCore-based Linear Algebra Algorithms: A Case Study of Classic Gram-Schmidt QR FactorizationShaoshuai Zhang, Panruo Wu. [doi]
- Efficient Complete Event Trend Detection over High-Velocity StreamsHuiyao Mei, Hanhua Chen, Hai Jin 0001, Qiang-Sheng Hua, Bing Zhou. [doi]
- Sparker: Efficient Reduction for More Scalable Machine Learning with SparkBowen Yu 0003, Huanqi Cao, Tianyi Shan, Haojie Wang, Xiongchao Tang, Wenguang Chen. [doi]
- Receiver-Driven Congestion Control for InfiniBandYiran Zhang, Kun Qian 0004, Fengyuan Ren. [doi]
- Multi-Agent Reinforcement Learning based Distributed Renewable Energy Matching for DatacentersHaoyu Wang 0003, Haiying Shen, Jiechao Gao, Kevin Zheng, Xiaoying Li. [doi]
- Context-aware Data Operation Strategies in Edge Systems for High Application PerformanceTanmoy Sen, Haiying Shen. [doi]
- A Novel Multi-CPU/GPU Collaborative Computing Framework for SGD-based Matrix FactorizationYizhi Huang, Yanlong Yin, Yan Liu, Shuibing He, Yang Bai 0007, Renfa Li. [doi]
- An Edge-Fencing Strategy for Optimizing SSSP Computations on Large-Scale GraphsHuashan Yu, Xiaolin Wang 0001, Yingwei Luo. [doi]
- Exploiting in-Hub Temporal Locality in SpMV-based Graph ProcessingMohsen Koohi Esfahani, Peter Kilpatrick, Hans Vandierendonck. [doi]
- IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEadsAymen Al Saadi, Dario Alfè, Yadu N. Babuji, Agastya Bhati, Ben Blaiszik, Alexander Brace, Thomas S. Brettin, Kyle Chard, Ryan Chard, Austin Clyde, Peter V. Coveney, Ian T. Foster, Tom Gibbs, Shantenu Jha, Kristopher Keipert, Dieter Kranzlmüller, Thorsten Kurth, Hyungro Lee, Zhuozhao Li, Heng Ma, Gerald Mathias, André Merzky, Alexander Partin, Arvind Ramanathan, Ashka Shah, Abraham C. Stern, Rick Stevens, Li Tan, Mikhail Titov, Anda Trifan, Aristeidis Tsaris, Matteo Turilli, Huub J. J. Van Dam, Shunzhou Wan, David Wifling, Junqi Yin. [doi]
- Fast and Consistent Remote Direct Access to Non-volatile MemoryJingwen Du, Fang Wang, Dan Feng 0001, Weiguang Li, Fan Li. [doi]
- Optimizing Winograd-Based Convolution with Tensor CoresJunhong Liu, Dongxu Yang, Junjie Lai. [doi]
- ASLDP: An Active Semi-supervised Learning method for Disk Failure PredictionYang Zhou, Fang Wang, Dan Feng 0001. [doi]
- Teddy: An Efficient SIMD-based Literal Matching Engine for Scalable Deep Packet InspectionKun Qiu, Harry Chang, Yang Hong, Wenjun Zhu, Xiang Wang, Baoqian Li. [doi]
- Paratick: Reducing Timer Overhead in Virtual MachinesStijn Schildermans, Kris Aerts, Jianchen Shan, Xiaoning Ding. [doi]
- Automatic Generation of High-Performance Inference Kernels for Graph Neural Networks on Multi-Core SystemsQiang Fu, H. Howie Huang. [doi]
- Using Vectorized Execution to Improve SQL Query Performance on SparkYijie Shen, Jin Xiong, Dejun Jiang. [doi]
- SPMFS: A Scalable Persistent Memory File System on Optane Persistent MemoryYang Yang 0068, Qiang Cao, Jie Yao, Yuanyuan Dong, Weikang Kong. [doi]
- A Graph-Assisted Out-of-Place Update Scheme for Erasure Coded Storage SystemsHaiwei Deng, Ranhao Jia, Chentao Wu. [doi]
- CuART - a CUDA-based, scalable Radix-Tree lookup and update engineMartin Koppehel, Tobias Groth, Sven Groppe, Thilo Pionteck. [doi]
- Enabling Efficient SIMD Acceleration for Virtual Radio Access NetworkJianda Wang, Yang Hu 0001. [doi]
- Hippie: A Data-Paralleled Pipeline Approach to Improve Memory-Efficiency and Scalability for Large DNN TrainingXiangyu Ye, Zhiquan Lai, Shengwei Li, Lei Cai, Ding Sun, Linbo Qiao, Dongsheng Li. [doi]
- Parallel Tucker Decomposition with Numerically Accurate SVDZitong Li, Qiming Fang, Grey Ballard. [doi]
- Accelerated Device Placement Optimization with Contrastive LearningHao Lan, Li Chen, Baochun Li. [doi]
- HDNH: a read-efficient and write-optimized hashing scheme for hybrid DRAM-NVM memoryJunhao Zhu, Kaixin Huang, Xiaomin Zou, Chenglong Huang, Nuo Xu, Liang Fang. [doi]
- Optimizing Massively Parallel Winograd Convolution on ARM ProcessorDongsheng Li, Dan Huang, Zhiguang Chen, Yutong Lu. [doi]
- Exploring HW/SW Co-Optimizations for Accelerating Large-scale Texture Identification on Distributed GPUsJunsong Wang, Xiaofan Zhang, Yubo Li, Yonghua Lin. [doi]
- AMPS-Inf: Automatic Model Partitioning for Serverless Inference with Cost EfficiencyJananie Jarachanthan, Li Chen, Fei Xu, Bo Li. [doi]
- Combining Dynamic Concurrency Throttling with Voltage and Frequency Scaling on Task-based Programming ModelsAntoni Navarro Muñoz, Arthur Francisco Lorenzon, Eduard Ayguadé Parra, Vicenç Beltran Querol. [doi]
- Multi-level Forwarding and Scheduling Repair Technique in Heterogeneous Network for Erasure-coded ClustersHai Zhou, Dan Feng 0001, Yuchong Hu. [doi]
- Efficiently Parallelizable Strassen-Based Multiplication of a Matrix by its TransposeViviana Arrigoni, Filippo Maggioli, Annalisa Massini, Emanuele Rodolà. [doi]
- Accurate Matrix Multiplication on Binary128 Format Accelerated by Ozaki SchemeDaichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura. [doi]
- CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay CompensationEnda Yu, Dezun Dong, Yemao Xu, Shuo Ouyang, Xiangke Liao. [doi]
- FastPSO: Towards Efficient Swarm Intelligence Algorithm on GPUsHanfeng Liu, Zeyi Wen, Wei Cai. [doi]
- BitX: Empower Versatile Inference with Hardware Runtime PruningHongyan Li, Hang Lu, Jiawen Huang, Wenxu Wang, Mingzhe Zhang, Wei Chen, Liang Chang, Xiaowei Li 0001. [doi]
- CNN+LSTM Accelerated Turbulent Flow Simulation with Link-Wise Artificial Compressibility MethodSijiang Fan, Jiawei Fei, Xiao-Wei Guo, Canqun Yang, Alistair Revell. [doi]
- Joint Optimization of DNN Partition and Scheduling for Mobile Cloud ComputingYubin Duan, Jie Wu 0001. [doi]
- Crash-Consistency-Aware Encryption for Non-Volatile MemoriesMengya Lei, Fang Wang, Dan Feng, Fan Li, Xueliang Wei. [doi]
- PREP: Predicting Job Runtime with Job Running Path on SupercomputersLongfang Zhou, Xiaorong Zhang, Wenxiang Yang, Yongguo Han, Fang Wang, Yadong Wu, Jie Yu 0006. [doi]
- A Fast, General System for Buffered Persistent Data StructuresHaosen Wen, Wentao Cai 0002, Mingzhe Du, Louis Jenkins, Benjamin Valpey, Michael L. Scott. [doi]
- Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core ProcessorsMingzhen Li, Yi Liu, Hailong Yang, Yongmin Hu, Qingxiao Sun, Bangduo Chen, Xin You, Xiaoyan Liu, Zhongzhi Luan, Depei Qian. [doi]
- Regu2D: Accelerating Vectorization of SpMV on Intel Processors through 2D-partitioning and Regular ArrangementXiang Fei, Youhui Zhang. [doi]
- Scaling Generalized N-Body Problems, A Case Study from GenomicsMarquita Ellis, Aydin Buluç, Katherine A. Yelick. [doi]
- HiPa: Hierarchical Partitioning for Fast PageRank on NUMA Multicore SystemsYuang Chen, Yeh-Ching Chung. [doi]
- ROBOTune: High-Dimensional Configuration Tuning for Cluster-Based Data AnalyticsMd. Muhib Khan, Weikuan Yu. [doi]
- ComputeCOVID19+: Accelerating COVID-19 Diagnosis and Monitoring via High-Performance Deep Learning on CT ImagesGarvit Goel, Atharva Gondhalekar, Jingyuan Qi, Zhicheng Zhang, Guohua Cao, Wu Feng. [doi]
- GVT-Guided Demand-Driven Scheduling in Parallel Discrete Event SimulationAli Eker, David Timmerman, Barry Williams, Kenneth Chiu, Dmitry Ponomarev. [doi]
- Intra-page Cache Update in SLC-mode with Partial Programming in High Density SSDsJun Li 0062, Minjun Li, Zhigang Cai, François Trahay, Mohamed Wahib, Balazs Gerofi, Zhiming Liu 0001, Min Huang, Jianwei Liao. [doi]
- Fourth-Order Exhaustive Epistasis Detection for the xPU EraRicardo Nobre, Aleksandar Ilic, Sergio Santander-Jiménez, Leonel Sousa. [doi]
- Generalized Skyline Interval Coloring and Dynamic Geometric Bin Packing ProblemsRuntian Ren, Xueyan Tang. [doi]
- NoStop: A Novel Configuration Optimization Scheme for Spark StreamingQianwen Ye, Wuji Liu, Chase Q. Wu. [doi]
- FedCav: Contribution-aware Model Aggregation on Distributed Heterogeneous Data in Federated LearningHui Zeng, Tongqing Zhou, Yeting Guo, Zhiping Cai, Fang Liu 0002. [doi]
- Tool-Supported Mini-App Extraction to Facilitate Program Analysis and ParallelizationJan-Patrick Lehr, Christian H. Bischof, Florian Dewald, Heiko Mantel, Mohammad Norouzi 0003, Felix Wolf 0001. [doi]
- LoWino: Towards Efficient Low-Precision Winograd Convolutions on Modern CPUsGuangli Li, Zhen Jia, Xiaobing Feng 0002, Yida Wang. [doi]
- Best VM Selection for Big Data Applications across Multiple Frameworks by Transfer LearningYuewen Wu, Heng Wu, Yuanjia Xu, Yi Hu, Wenbo Zhang 0006, Hua Zhong, Tao Huang. [doi]
- Communication Avoiding All-Pairs Shortest Paths Algorithm for Sparse GraphsLin Zhu, Qiang-Sheng Hua, Hai Jin 0001. [doi]
- FIFL: A Fair Incentive Mechanism for Federated LearningLiang Gao, Li Li, Yingwen Chen, Wenli Zheng, Chengzhong Xu 0001, Ming Xu. [doi]
- Efficient Parallel Algorithms for String ComparisonNikita Mishin, Daniil Berezun, Alexander Tiskin. [doi]
- sRouting: Towards a Better Flow Size Estimation Performance through Routing and Sketch ConfigurationYang Shi, Mei Wen. [doi]
- Tridiagonal GPU Solver with Scaled Partial Pivoting at Maximum BandwidthChristoph Klein, Robert Strzodka. [doi]
- Interferences between Communications and Computations in Distributed HPC SystemsAlexandre Denis 0001, Emmanuel Jeannot, Philippe Swartvagher. [doi]
- Efficient Modeling of Random Sampling-Based LRUJunyao Yang, Yuchen Wang, Zhenlin Wang. [doi]
- Exploiting system level heterogeneity to improve the performance of a GeoStatistics multi-phase task-based applicationLucas Leandro Nesi, Arnaud Legrand, Lucas Mello Schnorr. [doi]
- An Evaluation of Task-Parallel Frameworks for Sparse Solvers on Multicore and Manycore CPU ArchitecturesAbdullah Alperen, Md. Afibuzzaman, Fazlay Rabbi, M. Yusuf Özkaya, Ümit V. Çatalyürek, Hasan Metin Aktulga. [doi]
- Progressive Memory Adjustment with Performance Guarantee in Virtualized SystemsLulu Yao, Yongkun Li, Jiawei Li, Weijie Wu, Yinlong Xu. [doi]
- Coupling Right-Provisioned Cold Storage Data Centers with DeduplicationLiangfeng Cheng, Yuchong Hu, Zhaokang Ke, Zhongjie Wu. [doi]
- Optimizing Work Stealing Communication with Structured Atomic OperationsHannah Cartier, James Dinan, D. Brian Larkins. [doi]
- CERES: Container-Based Elastic Resource Management System for Mixed WorkloadsJinyu Yu, Dan Feng, Wei Tong, Pengze Lv, Yufei Xiong. [doi]