Abstract is missing.
- Beyond human-level accuracy: computational challenges in deep learningJoel Hestness, Newsha Ardalani, Gregory F. Diamos. 1-14 [doi]
- S-EnKF: co-designing for scalable ensemble Kalman filterJunmin Xiao, Shijie Wang, Weiqiang Wan, Xuehai Hong, Guangming Tan. 15-26 [doi]
- Throughput-oriented GPU memory allocationIsaac Gelado, Michael Garland. 27-37 [doi]
- SEP-graph: finding shortest execution paths for graph processing under a hybrid framework on GPUHao Wang, Liang Geng, Rubao Lee, Kaixi Hou, Yanfeng Zhang, Xiaodong Zhang 0001. 38-52 [doi]
- Incremental flattening for nested data parallelismTroels Henriksen, Frederik Thorøe, Martin Elsman, Cosmin E. Oancea. 53-67 [doi]
- Adaptive sparse matrix-matrix multiplication on the GPUMartin Winter, Daniel Mlakar, Rhaleb Zayer, Hans-Peter Seidel, Markus Steinberger. 68-81 [doi]
- Modular transactions: bounding mixed races in space and timeBrijesh Dongol, Radha Jagadeesan, James Riely. 82-93 [doi]
- Leveraging hardware TM in HaskellRyan Yates, Michael L. Scott. 94-106 [doi]
- Stretching the capacity of hardware transactional memory in IBM POWER architecturesRicardo Filipe, Shady Issa, Paolo Romano 0002, João Pedro Barreto 0002. 107-119 [doi]
- Processing transactions in a predefined orderMohamed M. Saad, Masoomeh Javidi Kishi, Shihao Jing, Sandeep Hans, Roberto Palmieri. 120-132 [doi]
- Harmonia: a high throughput B+tree for GPUsZhaofeng Yan, Yuzhe Lin, Lu Peng, Weihua Zhang. 133-144 [doi]
- Engineering a high-performance GPU B-TreeMuhammad A. Awad, Saman Ashkiani, Rob Johnson, Martin Farach-Colton, John D. Owens. 145-157 [doi]
- QTLS: high-performance TLS asynchronous offload framework with Intel® QuickAssist technologyXiaokang Hu, Changzheng Wei, Jian Li 0021, Brian Will, Ping Yu, Lu Gong, Haibing Guan. 158-172 [doi]
- Data-flow/dependence profiling for structured transformationsFabian Gruber, Manuel Selva, Diogo Sampaio, Christophe Guillon, Antoine Moynault, Louis-Noël Pouchet, Fabrice Rastello. 173-185 [doi]
- Lightweight hardware transactional memory profilingQingsen Wang, Pengfei Su, Milind Chabbi, Xu Liu 0001. 186-200 [doi]
- A pattern based algorithmic autotuner for graph processing on GPUsKe Meng, Jiajia Li, Guangming Tan, Ninghui Sun. 201-213 [doi]
- Provably and practically efficient granularity controlUmut A. Acar, Vitaly Aksenov, Arthur Charguéraud, Mike Rainey. 214-228 [doi]
- A coordinated tiling and batching framework for efficient GEMM on GPUsXiuhong Li, Yun Liang 0001, Shengen Yan, Liancheng Jia, Yinghan Li. 229-241 [doi]
- Semantics-aware scheduling policies for synchronization determinismQi Zhao, Zhengyi Qiu, Guoliang Jin. 242-256 [doi]
- Proactive work stealing for futuresKyle Singer, Yifan Xu, I-Ting Angelina Lee. 257-271 [doi]
- A round-efficient distributed betweenness centrality algorithmLoc Hoang, Matteo Pontecorvi, Roshan Dathathri, Gurbinder Gill, Bozhi You, Keshav Pingali, Vijaya Ramachandran. 272-286 [doi]
- Corrected trees for reliable group communicationMartin Küttler, Maksym Planeta, Jan Bierbaum, Carsten Weinhold, Hermann Härtig, Amnon Barak, Torsten Hoefler. 287-299 [doi]
- Adaptive sparse tiling for sparse matrix multiplicationChangwan Hong, Aravind Sukumaran-Rajam, Israt Nisa, Kunal Singh, P. Sadayappan. 300-314 [doi]
- Encapsulated open nesting for STM: fine-grained higher-level conflict detectionMartin Bättig, Thomas R. Gross. 315-326 [doi]
- A specialized B-tree for concurrent datalog evaluationHerbert Jordan, Pavle Subotic, David Zhao, Bernhard Scholz. 327-339 [doi]
- Efficient race detection with futuresRobert Utterback, Kunal Agrawal, Jeremy T. Fineman, I-Ting Angelina Lee. 340-354 [doi]
- Verifying C11 programs operationallySimon Doherty, Brijesh Dongol, Heike Wehrheim, John Derrick. 355-365 [doi]
- Checking linearizability using hitting familiesBurcu Kulahcioglu Ozkan, Rupak Majumdar, Filip Niksic. 366-377 [doi]
- Transitive joins: a sound and efficient online deadlock-avoidance policyCaleb Voss, Tiago Cogumbreiro, Vivek Sarkar. 378-390 [doi]
- VEBO: a vertex- and edge-balanced ordering heuristic to load balance parallel graph processingJiawen Sun, Hans Vandierendonck, Dimitrios S. Nikolopoulos. 391-392 [doi]
- GPOP: a cache and memory-efficient framework for graph processing over partitionsKartik Lakhotia, Rajgopal Kannan, Sourav Pati, Viktor K. Prasanna. 393-394 [doi]
- Optimizing graph processing on GPUs using approximate computing: posterSomesh Singh, Rupesh Nasre. 395-396 [doi]
- A GPU memory efficient speed-up scheme for training ultra-deep neural networks: posterJinrong Guo, Wantao Liu, Wang Wang, Qu Lu, Songlin Hu, Jizhong Han, Ruixuan Li. 397-398 [doi]
- Profiling based out-of-core hybrid method for large neural networks: posterYuki Ito, Haruki Imai, Tung D. Le, Yasushi Negishi, Kiyokuni Kawachiya, Ryo Matsumiya, Toshio Endo. 399-400 [doi]
- Exploiting the input sparsity to accelerate deep neural networks: posterXiao Dong, Lei Liu, Guangli Li, Jiansong Li, Peng Zhao, Xueying Wang, Xiaobing Feng 0002. 401-402 [doi]
- Accelerating distributed stochastic gradient descent with adaptive periodic parameter averaging: posterPeng Jiang, Gagan Agrawal. 403-404 [doi]
- Optimizing GPU programs by register demotion: posterPutt Sakdhnagool, Amit Sabne, Rudolf Eigenmann. 405-406 [doi]
- A distributed hypervisor for resource aggregation: posterYubin Chen, Zhuocheng Ding, Jin Zhang, Yun Wang, Zhengwei Qi, Haibing Guan. 407-408 [doi]
- Scheduling HPC workloads on heterogeneous-ISA architectures: posterMohamed L. Karaoui, Anthony Carno, Robert Lyerly, Sang-Hoon Kim, Pierre Olivier, Changwoo Min, Binoy Ravindran. 409-410 [doi]
- T-thinker: a task-centric distributed framework for compute-intensive divide-and-conquer algorithmsDa Yan, Guimu Guo, Md Mashiur Rahman Chowdhury, M. Tamer Özsu, John C. S. Lui, Weida Tan. 411-412 [doi]
- Toward efficient architecture-independent algorithms for dynamic programs: posterMohammad Mahdi Javanmard, Pramod Ganapathr, Rathish Das, Zafar Ahmad, Stephen L. Tschudi, Rezaul Chowdhury. 413-414 [doi]
- Optimizing computation-communication overlap in asynchronous task-based programs: posterEmilio Castillo, Nikhil Jain, Marc Casas, Miquel Moretó, Martin Schulz 0001, Ramón Beivide, Mateo Valero, Abhinav Bhatele. 415-416 [doi]
- Lock-free channels for programming via communicating sequential processes: posterNikita Koval, Dan Alistarh, Roman Elizarov. 417-418 [doi]
- Making concurrent algorithms detectable: posterNaama Ben-David, Guy E. Blelloch, Michal Friedman, Yuanhao Wei. 419-420 [doi]
- GPU-based 3D cryo-EM reconstruction with key-value streams: posterKunpeng Wang, Shizhen Xu, Hongkun Yu, Haohuan Fu, Guangwen Yang. 421-422 [doi]
- BASMAT: bottleneck-aware sparse matrix-vector multiplication auto-tuning on GPGPUsAthena Elafrou, Georgios I. Goumas, Nectarios Koziris. 423-424 [doi]
- LOFT: lock-free transactional data structuresAvner Elizarov, Guy Golan-Gueta, Erez Petrank. 425-426 [doi]
- Automated multi-dimensional elasticity for streaming runtimes: posterXiang Ni, Scott Schneider 0001, Raju Pavuluri, Jonathan Kaus, Kun-Lung Wu. 427-428 [doi]
- Compiler-assisted adaptive program scheduling in big.LITTLE systems: posterMarcelo Novaes, Vinicius Petrucci, Abdoulaye Gamatié, Fernando Magno Quintão Pereira. 429-430 [doi]
- GOPipe: a granularity-oblivious programming framework for pipelined stencil executions on GPUChanyoung Oh, Zhen Zheng, Xipeng Shen, Jidong Zhai, Youngmin Yi. 431-432 [doi]
- High-throughput image alignment for connectomics using frugal snap judgments: posterTim Kaler, Brian Wheatman, Sarah Wooders. 433-434 [doi]
- CuLDA_CGS: solving large-scale LDA problems on GPUsXiaolong Xie, Yun Liang 0001, Xiuhong Li, Wei Tan. 435-436 [doi]
- Managing application parallelism via parallel efficiency regulation: posterSharanyan Srikanthan, Princeton Ferro, Sayak Chakraborti, Sandhya Dwarkadas. 437-438 [doi]
- Blockchain abstract data type: posterEmmanuelle Anceaume, Antonella Del Pozzo, Romaric Ludinard, Maria Potop-Butucaru, Sara Tucci Piergiovanni. 439-440 [doi]
- Creating repeatable, reusable experimentation pipelines with popper: tutorialIvo Jimenez, Jay F. Lofstead, Carlos Maltzahn. 441-442 [doi]
- Building parallel programming language constructs in the AbleC extensible C compiler framework: a PPoPP tutorialTravis Carlson, Eric Van Wyk. 443-446 [doi]
- Implementing parallel and concurrent tree structuresYihan Sun 0001, Guy E. Blelloch. 447-450 [doi]
- Programming quantum computers: a primer with IBM Q and D-Wave exercisesFrank Mueller, Greg Byrd, Patrick Dreher. 451 [doi]
- High performance distributed deep learning: a beginner's guideDhabaleswar K. Panda, Ammar Ahmad Awan, Hari Subramoni. 452-454 [doi]
- Performance portable C++ programming with RAJADavid Beckingsale, Richard D. Hornung, Tom Scogland, Arturo Vargas. 455-456 [doi]