Abstract is missing.
- It's Time for a New Old LanguageGuy L. Steele Jr.. 1 [doi]
- EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPUGuoyang Chen, Yue Zhao, Xipeng Shen, Huiyang Zhou. 3-16 [doi]
- Layout Lock: A Scalable Locking Paradigm for Concurrent Data Layout ModificationsNachshon Cohen, Arie Tal, Erez Petrank. 17-29 [doi]
- Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance TuningXiuxia Zhang, Guangming Tan, Shuangbai Xue, Jiajia Li, Ke-ren Zhou, Mingyu Chen. 31-43 [doi]
- Checking Concurrent Data Structures Under the C/C++11 Memory ModelPeizhao Ou, Brian Demsky. 45-59 [doi]
- An Efficient Abortable-locking Protocol for Multi-level NUMA SystemsMilind Chabbi, Abdelhalim Amer, Shasha Wen, Xu Liu. 61-74 [doi]
- Contention in Structured Concurrency: Provably Efficient Dynamic Non-Zero Indicators for Nested ParallelismUmut A. Acar, Naama Ben-David, Mike Rainey. 75-88 [doi]
- Noise Injection Techniques to Expose Subtle and Unintended Message RacesKento Sato, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Martin Schulz 0001, Christopher M. Chambreau. 89-101 [doi]
- Thread Data Sharing in Cache: Theory and MeasurementHao Luo, Pengcheng Li, Chen Ding. 103-115 [doi]
- Exploiting Vector and Multicore Parallelism for Recursive, Data- and Task-Parallel ProgramsBin Ren, Sriram Krishnamoorthy, Kunal Agrawal, Milind Kulkarni. 117-130 [doi]
- Isoefficiency in Practice: Configuring and Understanding the Performance of Task-based ApplicationsSergei Shudler, Alexandru Calotoiu, Torsten Hoefler, Felix Wolf. 131-143 [doi]
- Processor-Oblivious Record and ReplayRobert Utterback, Kunal Agrawal, I-Ting Angelina Lee, Milind Kulkarni. 145-161 [doi]
- Simple, Accurate, Analytical Time Modeling and Optimal Tile Size Selection for GPGPU StencilsNirmal Prajapati, Waruna Ranasinghe, Sanjay V. Rajopadhye, Rumen Andonov, Hristo Djidjev, Tobias Grosser. 163-177 [doi]
- Combining SIMD and Many/Multi-core Parallelism for Finite State Machines with Enumerative SpeculationPeng Jiang, Gagan Agrawal. 179-191 [doi]
- S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU ClustersAmmar Ahmad Awan, Khaled Hamidouche, Jahanzeb Maqbool Hashmi, Dhabaleswar K. Panda. 193-205 [doi]
- Model-based Iterative CT Image Reconstruction on GPUsAmit Sabne, Xiao Wang, Sherman J. Kisner, Charles A. Bouman, Anand Raghunathan, Samuel P. Midkiff. 207-220 [doi]
- Pagoda: Fine-Grained GPU Resource Virtualization for Narrow TasksTsung Tai Yeh, Amit Sabne, Putt Sakdhnagool, Rudolf Eigenmann, Timothy G. Rogers. 221-234 [doi]
- Groute: An Asynchronous Multi-GPU Programming Model for Irregular ComputationsTal Ben-Nun, Michael Sutton, Sreepathi Pai, Keshav Pingali. 235-248 [doi]
- Tapir: Embedding Fork-Join Parallelism into LLVM's Intermediate RepresentationTao B. Schardl, William S. Moses, Charles E. Leiserson. 249-265 [doi]
- A Multicore Path to Connectomics-on-DemandAlexander Matveev, Yaron Meirovitch, Hayk Saribekyan, Wiktor Jakubiuk, Tim Kaler, Gergely Ódor, David Budden, Aleksandar Zlateski, Nir Shavit. 267-281 [doi]
- SC-Haskell: Sequential Consistency in Languages That Minimize Mutable Shared HeapMichael Vollmer, Ryan G. Scott, Madanlal Musuvathi, Ryan R. Newton. 283-298 [doi]
- Synchronized-by-Default Concurrency for Shared-Memory SystemsMartin Bättig, Thomas R. Gross. 299-312 [doi]
- Function Call Re-VectorizationRubens E. A. Moreira, Sylvain Collange, Fernando Magno Quintão Pereira. 313-326 [doi]
- Optimizing the Four-Index Integral Transform Using Data Movement Lower Bounds AnalysisSamyam Rajbhandari, Fabrice Rastello, Karol Kowalski, Sriram Krishnamoorthy, P. Sadayappan. 327-340 [doi]
- Using Butterfly-Patterned Partial Sums to Draw from Discrete DistributionsGuy L. Steele Jr., Jean-Baptiste Tristan. 341-355 [doi]
- KiWi: A Key-Value Map for Scalable Real-Time AnalyticsDmitry Basin, Edward Bortnikov, Anastasia Braginsky, Guy Golan-Gueta, Eshcar Hillel, Idit Keidar, Moshe Sulamy. 357-369 [doi]
- Grammar-aware Parallelization for Scalable XPath QueryingLin Jiang, Zhijia Zhao. 371-383 [doi]
- Eunomia: Scaling Concurrent Search Trees under Contention Using HTMXin Wang, Weihua Zhang, Zhaoguo Wang, Ziyun Wei, Haibo Chen, Wenyun Zhao. 385-399 [doi]
- Self-Checkpoint: An In-Memory Checkpoint Method Using Less Space and Its Practice on Fault-Tolerant HPLXiongchao Tang, Jidong Zhai, Bowen Yu, Wenguang Chen, Weimin Zheng. 401-413 [doi]
- Silent Data Corruption Resilient Two-sided Matrix FactorizationsPanruo Wu, Nathan DeBardeleben, Qiang Guan, Sean Blanchard, Jieyang Chen, Dingwen Tao, Xin Liang, Kaiming Ouyang, Zizhong Chen. 415-427 [doi]
- POSTER: Reuse, don't Recycle: Transforming Algorithms that Throw Away DescriptorsMaya Arbel-Raviv, Trevor Brown. 429-430 [doi]
- POSTER: An Architecture and Programming Model for Accelerating Parallel Commutative Computations via PrivatizationVignesh Balaji, Dhruva Tirumala, Brandon Lucia. 431-432 [doi]
- POSTER: HythTM: Extending the Applicability of Intel TSX Hardware Transactional SupportArnamoy Bhattacharyya, Mike Dai Wang, Mihai Burcea, Yi Ding, Allen Deng, Sai Varikooty, Shafaaf Hossain, Cristiana Amza. 433-434 [doi]
- POSTER: Provably Efficient Scheduling of Cache-Oblivious Wavefront AlgorithmsRezaul Chowdhury, Pramod Ganapathi, Yuan Tang, Jesmin Jahan Tithi. 435-436 [doi]
- POSTER: State Teleportation via Hardware Transactional MemoryNachshon Cohen, Maurice Herlihy, Erez Petrank, Elias Wald. 437-438 [doi]
- POSTER: IOGP: An Incremental Online Graph Partitioning for Large-Scale Distributed Graph DatabasesDong Dai, Wei Zhang, Yong Chen. 439-440 [doi]
- POSTER: Distributed Control: The Benefits of Eliminating Global Synchronization via Effective SchedulingJesun Sahariar Firoz, Thejaka Amila Kanewala, Marcin Zalewski, Martina Barnas, Andrew Lumsdaine. 441-442 [doi]
- POSTER: MAPA: An Automatic Memory Access Pattern Analyzer for GPU ApplicationsGangwon Jo, Jaehoon Jung, Jiyoung Park, Jaejin Lee. 443-444 [doi]
- POSTER: Cache-Oblivious MPI All-to-All Communications on Many-Core ArchitecturesShigang Li 0002, Yunquan Zhang, Torsten Hoefler. 445-446 [doi]
- POSTER: Automated Load Balancer Selection Based on Application CharacteristicsHarshitha Menon, Kavitha Chandrasekar, Laxmikant V. Kalé. 447-448 [doi]
- POSTER: A GPU-Friendly Skiplist AlgorithmNurit Moscovici, Nachshon Cohen, Erez Petrank. 449-450 [doi]
- POSTER: Poor Man's URCUPedro Ramalhete, Andreia Correia. 451-452 [doi]
- POSTER: A Wait-Free Queue with Wait-Free Memory ReclamationPedro Ramalhete, Andreia Correia. 453-454 [doi]
- POSTER: STAR (Space-Time Adaptive and Reductive) Algorithms for Real-World Space-Time OptimalityYuan Tang, Ronghui You. 455-456 [doi]
- POSTER: Recovering Performance for Vector-based Machine Learning on Managed RuntimeMingyu Wu, Haibing Guan, Binyu Zang, Haibo Chen. 457-458 [doi]
- POSTER: On the Problem of Consistency Exceptions in the Context of Strong Memory ModelsMinjia Zhang, Swarnendu Biswas, Michael D. Bond. 459-460 [doi]
- POSTER: An Infrastructure for HPC Knowledge Sharing and ReuseYue Zhao, Chunhua Liao, Xipeng Shen. 461-462 [doi]