Abstract is missing.
- Kite: efficient and available release consistency for the datacenterVasilis Gavrielatos, Antonios Katsarakis, Vijay Nagarajan, Boris Grot, Arpit Joshi. 1-16 [doi]
- Oak: a scalable off-heap allocated key-value mapHagar Meir, Dmitry Basin, Edward Bortnikov, Anastasia Braginsky, Yonatan Gottesman, Idit Keidar, Eran Meir, Gali Sheffi, Yoav Zuriel. 17-31 [doi]
- Optimizing batched winograd convolution on GPUsDa Yan, Wei Wang, Xiaowen Chu. 32-44 [doi]
- Taming unbalanced training workloads in deep learning with partial collective operationsShigang Li 0002, Tal Ben-Nun, Salvatore Di Girolamo, Dan Alistarh, Torsten Hoefler. 45-61 [doi]
- Scalable top-k retrieval with SpartaGali Sheffi, Dmitry Basin, Edward Bortnikov, David Carmel, Idit Keidar. 62-73 [doi]
- waveSZ: a hardware-algorithm co-design of efficient lossy compression for scientific dataJiannan Tian, Sheng Di, Chengming Zhang, Xin Liang, Sian Jin, Dazhao Cheng, Dingwen Tao, Franck Cappello. 74-88 [doi]
- Scaling concurrent queues by using HTM to profit from failed atomic operationsOr Ostrovsky, Adam Morrison 0001. 89-101 [doi]
- A wait-free universal construction for large objectsAndreia Correia, Pedro Ramalhete, Pascal Felber. 102-116 [doi]
- Fast concurrent data sketchesArik Rinberg, Alexander Spiegelman, Edward Bortnikov, Eshcar Hillel, Idit Keidar, Lee Rhodes, Hadar Serviansky. 117-129 [doi]
- Universal wait-free memory reclamationRuslan Nikolaev 0001, Binoy Ravindran. 130-143 [doi]
- Using sample-based time series data for automated diagnosis of scalability losses in parallel programsLai Wei, John M. Mellor-Crummey. 144-159 [doi]
- Scaling out speculative execution of finite-state machines with parallel mergeYang Xia, Peng Jiang, Gagan Agrawal. 160-172 [doi]
- On the fly MHP analysisSonali Saha, V. Krishna Nandivada. 173-186 [doi]
- Detecting and reproducing error-code propagation bugs in MPI implementationsDaniel DeFreez, Antara Bhowmick, Ignacio Laguna, Cindy Rubio-González. 187-201 [doi]
- Parallel and distributed bounded model checking of multi-threaded programsOmar Inverso, Catia Trubiani. 202-216 [doi]
- Parallel determinacy race detection for futuresYifan Xu, Kyle Singer, I-Ting Angelina Lee. 217-231 [doi]
- Practical parallel hypergraph algorithmsJulian Shun. 232-249 [doi]
- A supernodal all-pairs shortest path algorithmPiyush Sao, Ramakrishnan Kannan, Prasun Gera, Richard W. Vuduc. 250-261 [doi]
- Increasing the parallelism of graph coloring via shortcuttingGhadeer Alabandi, Evan Powers, Martin Burtscher. 262-275 [doi]
- Non-blocking interpolation search trees with doubly-logarithmic running timeTrevor Brown 0001, Aleksandar Prokopec, Dan Alistarh. 276-291 [doi]
- YewPar: skeletons for exact combinatorial searchBlair Archibald, Patrick Maier 0001, Rob Stewart 0001, Phil Trinder. 292-307 [doi]
- XIndex: a scalable learned index for multicore data storageChuzhe Tang, Youyun Wang, Zhiyuan Dong, Gansen Hu, Zhaoguo Wang, Minjie Wang, Haibo Chen. 308-320 [doi]
- Overlapping host-to-device copy and computation using hidden unified memoryJaehoon Jung, Daeyoung Park, Youngdong Do, Jungho Park, Jaejin Lee. 321-335 [doi]
- <u>G</u>PU <u>i</u>nitiated <u>O</u>penSHMEM: correct and efficient intra-kernel networking for dGPUsKhaled Hamidouche, Michael LeBeane. 336-347 [doi]
- No barrier in the road: a comprehensive study and optimization of ARM barriersNian Liu, Binyu Zang, Haibo Chen. 348-361 [doi]
- spECK: accelerating GPU sparse matrix-matrix multiplication through lightweight analysisMathias Parger, Martin Winter, Daniel Mlakar, Markus Steinberger. 362-375 [doi]
- A novel data transformation and execution strategy for accelerating sparse matrix multiplication on GPUsPeng Jiang, Changwan Hong, Gagan Agrawal. 376-388 [doi]
- MatRox: modular approach for improving data locality in hierarchical (Mat)rix App(Rox)imationBangtian Liu, Kazem Cheshmi, Saeed Soori, Michelle Mills Strout, Maryam Mehri Dehnavi. 389-402 [doi]
- A parallel sparse tensor benchmark suite on CPUs and GPUsJiajia Li, Mahesh Lakshminarasimhan, Xiaolong Wu, Ang Li, Catherine Olschanowsky, Kevin J. Barker. 403-404 [doi]
- Nesting and composition in transactional data structure librariesGal Assa, Hagar Meir, Guy Golan-Gueta, Idit Keidar, Alexander Spiegelman. 405-406 [doi]
- ELDA: LDA made efficient via algorithm-system codesign submissionShilong Wang, Da Li, Hengyong Yu, Hang Liu. 407-408 [doi]
- Identifying scalability bottlenecks for large-scale parallel programs with graph analysisYuyang Jin, Haojie Wang, Xiongchao Tang, Torsten Hoefler, Xu Liu, Jidong Zhai. 409-410 [doi]
- Revisiting linpack algorithm on large-scale CPU-GPU heterogeneous systemsChaoyang Shui, Xianzhi Yu, Yujin Yan, YinShan Wang, Ke Meng, Guangming Tan. 411-412 [doi]
- Neighbor-list-free molecular dynamics on sunway TaihuLight supercomputerXiaohui Duan, Ping Gao 0005, Meng Zhang, Tingjian Zhang, Hongsong Meng, Yuxuan Li, Bertil Schmidt, Haohuan Fu, Lin Gan, Wei Xue, Guangwen Yang, Weiguo Liu. 413-414 [doi]
- A tool for top-down performance analysis of GPU-accelerated applicationsKeren Zhou, Mark Krentel, John M. Mellor-Crummey. 415-416 [doi]
- Functional faultsGali Sheffi, Erez Petrank. 417-418 [doi]
- Breaking master-slave model between host and FPGAsJaume Bosch, Miquel Vidal, Antonio Filgueras, Carlos Álvarez 0001, Daniel Jiménez-González, Xavier Martorell, Eduard Ayguadé. 419-420 [doi]
- Understanding and optimizing persistent memory allocationWentao Cai, Haosen Wen, H. Alan Beadle, Mohammad Hedayati, Michael L. Scott. 421-422 [doi]
- Testing concurrency on the JVM with lincheckNikita Koval, Maria Sokolova, Alexander Fedorov, Dan Alistarh, Dmitry Tsitelov. 423-424 [doi]
- ArcherGear: data race equivalencing for expeditious HPC debuggingSamuel Thayer, Ganesh Gopalakrishnan, Ian Briggs, Michael Bentley, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee. 425-426 [doi]
- Reflector: a fine-grained I/O tracker for HPC systemsAbdullah Al-Mamun 0001, Jialin Liu, Tonglin Li, Quincey Koziol, Zhongyi Zhai, Junyan Qian, Haoting Shen, Dongfang Zhao 0001. 427-428 [doi]
- Nonblocking persistent software transactional memoryH. Alan Beadle, Wentao Cai, Haosen Wen, Michael L. Scott. 429-430 [doi]
- Optimizing GPU programs by partial evaluationAleksey Tyurin, Daniil Berezun, Semyon Grigorev. 431-432 [doi]
- Restricted memory-friendly lock-free bounded queuesNikita Koval, Vitaly Aksenov. 433-434 [doi]
- Understand the overheads of storage data structures on persistent memoryAbdullah Al Raqibul Islam, Dong Dai. 435-436 [doi]
- PLUM: static parallel program locality analysis under uniform multiplexingFangzhou Liu, Dong Chen, Wesley Smith, Chen Ding. 437-438 [doi]