Abstract is missing.
- Stream processing with dependency-guided synchronizationKonstantinos Kallas, Filip Niksic, Caleb Stanford, Rajeev Alur. 1-16 [doi]
- CASE: a compiler-assisted SchEduling framework for multi-GPU systemsChao Chen 0024, Chris Porter, Santosh Pande. 17-31 [doi]
- Dopia: online parallelism management for integrated CPU/GPU architecturesYounghyun Cho, Jiyeon Park, Florian Negele, Changyeon Jo, Thomas R. Gross, Bernhard Egger. 32-45 [doi]
- Mashup: making serverless computing useful for HPC workflows via hybrid executionRohan Basu Roy, Tirthak Patel, Vijay Gadepally, Devesh Tiwari. 46-60 [doi]
- Parallel block-delayed sequencesSam Westrick, Mike Rainey, Daniel Anderson, Guy E. Blelloch. 61-75 [doi]
- RTNN: accelerating neighbor search using hardware ray tracingYuhao Zhu 0001. 76-89 [doi]
- TileSpGEMM: a tiled algorithm for parallel sparse general matrix-matrix multiplication on GPUsYuyao Niu, Zhengyang Lu, Haonan Ji, Shuhui Song, Zhou Jin 0001, Weifeng Liu 0002. 90-106 [doi]
- QGTC: accelerating quantized graph neural networks via GPU tensor coreYuke Wang, Boyuan Feng, Yufei Ding. 107-119 [doi]
- FasterMoE: modeling and optimizing training of large-scale dynamic pre-trained modelsJiaao He, Jidong Zhai, Tiago Antunes, Haojie Wang, Fuwen Luo, Shangfeng Shi, Qin Li. 120-134 [doi]
- Near-optimal sparse allreduce for distributed deep learningShigang Li 0002, Torsten Hoefler. 135-149 [doi]
- Vapro: performance variance detection and diagnosis for production-run parallel applicationsLiyan Zheng, Jidong Zhai, Xiongchao Tang, Haojie Wang, Teng Yu, Yuyang Jin, Shuaiwen Leon Song, Wenguang Chen. 150-162 [doi]
- Interference relation-guided SMT solving for multi-threaded program verificationHongyu Fan, Weiting Liu, Fei He 0001. 163-176 [doi]
- PerFlow: a domain specific framework for automatic performance analysis of parallel applicationsYuyang Jin, Haojie Wang, Runxin Zhong, Chen Zhang, Jidong Zhai. 177-191 [doi]
- BaGuaLu: targeting brain scale pretrained models with over 37 million coresZixuan Ma, Jiaao He, Jiezhong Qiu, Huanqi Cao, Yuanwei Wang, Zhenbo Sun, Liyan Zheng, Haojie Wang, Shizhi Tang, Tianyu Zheng, Junyang Lin, Guanyu Feng, Zeqiang Huang, Jie Gao, Aohan Zeng, Jianwei Zhang 0012, Runxin Zhong, Tianhui Shi, Sha Liu, Weimin Zheng, Jie Tang, Hongxia Yang, Xin Liu, Jidong Zhai, Wenguang Chen. 192-204 [doi]
- ab initio accuracy to 10 billion atomsZhuoqiang Guo, Denghui Lu, Yujin Yan, Siyu Hu, Rongrong Liu, Guangming Tan, Ninghui Sun, Wanrun Jiang, Lijun Liu, Yixiao Chen, Linfeng Zhang, MoHan Chen, Han Wang 0006, Weile Jia. 205-218 [doi]
- LOTUS: locality optimizing triangle countingMohsen Koohi Esfahani, Peter Kilpatrick, Hans Vandierendonck. 219-233 [doi]
- Scaling graph traversal to 281 trillion edges with 40 million coresHuanqi Cao, Yuanwei Wang, Haojie Wang, Heng Lin, Zixuan Ma, Wanwang Yin, Wenguang Chen. 234-245 [doi]
- Deadlock-free asynchronous message reordering in rust with multiparty session typesZak Cutner, Nobuko Yoshida, Martin Vassor. 246-261 [doi]
- Detectable recovery of lock-free data structuresHagit Attiya, Ohad Ben-Baruch, Panagiota Fatourou, Danny Hendler, Eleftherios Kosmas. 262-277 [doi]
- Lock-free locks revisitedNaama Ben-David, Guy E. Blelloch, Yuanhao Wei. 278-293 [doi]
- Asymmetry-aware scalable lockingNian Liu, Jinyu Gu 0001, Dahai Tang, Kenli Li 0001, Binyu Zang, Haibo Chen 0001. 294-308 [doi]
- FliT: a library for simple and efficient persistent algorithmsYuanhao Wei, Naama Ben-David, Michal Friedman 0001, Guy E. Blelloch, Erez Petrank. 309-321 [doi]
- Understanding and detecting deep memory persistency bugs in NVM programs with DeepMCBenjamin Reidys, Jian Huang 0006. 322-336 [doi]
- The performance power of software combining in persistencePanagiota Fatourou, Nikolaos D. Kallimanis, Eleftherios Kosmas. 337-352 [doi]
- Multi-queues can be state-of-the-art priority schedulersAnastasiia Postnikova, Nikita Koval, Giorgi Nadiradze, Dan Alistarh. 353-367 [doi]
- Bundling linked data structures for linearizable range queriesJacob Nelson-Slivon, Ahmed Hassan, Roberto Palmieri. 368-384 [doi]
- PathCAS: an efficient middle ground for concurrent search data structuresTrevor Brown 0001, William Sigouin, Dan Alistarh. 385-399 [doi]
- Jiffy: a lock-free skip list with batch updates and snapshotsTadeusz Kobus, Maciej Kokocinski, Pawel T. Wojciechowski. 400-415 [doi]
- Elimination (a, b)-trees with fast, durable updatesAnubhav Srivastava, Trevor Brown 0001. 416-430 [doi]
- Automatic synthesis of parallel unix commands and pipelines with KumQuatJiasi Shen 0001, Martin Rinard, Nikos Vasilakis. 431-432 [doi]
- Towards OmpSs-2 and OpenACC interoperationOrestis Korakitis, Simon Garcia De Gonzalo, Nicolas Guidotti, João Pedro Barreto 0002, José C. Monteiro, Antonio J. Peña. 433-434 [doi]
- LB-HM: load balance-aware data placement on heterogeneous memory for task-parallel HPC applicationsZhen Xie, Jie Liu, Sam Ma, Jiajia Li 0001, Dong Li. 435-436 [doi]
- Hardening selective protection across multiple program inputs for HPC applicationsYafan Huang, Shengjian Guo, Sheng Di, Guanpeng Li, Franck Cappello. 437-438 [doi]
- A parallel branch-and-bound algorithm with history-based dominationTaspon Gonggiatgul, Ghassan Shobaki, Pinar Muyan-Özçelik. 439-440 [doi]
- Remote OpenMP offloadingAtmn Patel, Johannes Doerfert. 441-442 [doi]
- High performance GPU concurrent B+treeWeihua Zhang, Chuanlei Zhao, Lu Peng, Yuzhe Lin, Fengzhe Zhang, Jinhu Jiang. 443-444 [doi]
- The problem-based benchmark suite (PBBS), V2Daniel Anderson, Guy E. Blelloch, Laxman Dhulipala, Magdalen Dobson, Yihan Sun 0001. 445-447 [doi]
- An LLVM-based open-source compiler for NVIDIA GPUsDa Yan 0002, Wei Wang, Xiaowen Chu. 448-449 [doi]
- ParGeo: a library for parallel computational geometryYiqiu Wang, Shangdi Yu, Laxman Dhulipala, Yan Gu 0001, Julian Shun. 450-452 [doi]
- Parallel algorithms for masked sparse matrix-matrix productsSrdan Milakovic, Oguz Selvitopi, Israt Nisa, Zoran Budimlic, Aydin Buluç. 453-454 [doi]
- Rethinking graph data placement for graph neural network training on multiple GPUsShihui Song, Peng Jiang. 455-456 [doi]
- Optimizing consistency for partially replicated data storesIvan Kuraj, Armando Solar-Lezama, Nadia Polikarpova. 457-458 [doi]
- Optimizing sparse computations jointlyKazem Cheshmi, Michelle Mills Strout, Maryam Mehri Dehnavi. 459-460 [doi]
- wCQ: a fast wait-free queue with bounded memory usageRuslan Nikolaev 0001, Binoy Ravindran. 461-462 [doi]
- Automatic differentiation of parallel loops with formal methodsJan Hückelheim, Laurent Hascoët. 463-464 [doi]
- A W-cycle algorithm for efficient batched SVD on GPUsJunmin Xiao, Qing Xue, Hui Ma, Xiaoyang Zhang, Guangming Tan. 465-466 [doi]