Abstract is missing.
- Phase Aware Warp Scheduling: Mitigating Effects of Phase Behavior in GPGPU ApplicationsMihir Awatramani, Xian Zhu, Joseph Zambreno, Diane T. Rover. 1-12 [doi]
- NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD ArchitecturesJie Zhang, David Donofrio, John Shalf, Mahmut T. Kandemir, Myoungsoo Jung. 13-24 [doi]
- Exploiting Inter-Warp Heterogeneity to Improve GPGPU PerformanceRachata Ausavarungnirun, Saugata Ghose, Onur Kayiran, Gabriel H. Loh, Chita R. Das, Mahmut T. Kandemir, Onur Mutlu. 25-38 [doi]
- Scalable SIMD-Efficient Graph Processing on GPUsFarzad Khorasani, Rajiv Gupta, Laxmi N. Bhuyan. 39-50 [doi]
- Parallel Methods for Verifying the Consistency of Weakly-Ordered ArchitecturesAdam McLaughlin, Duane Merrill, Michael Garland, David A. Bader. 51-62 [doi]
- Stadium Hashing: Scalable and Flexible Hashing on GPUsFarzad Khorasani, Mehmet E. Belviranli, Rajiv Gupta, Laxmi N. Bhuyan. 63-74 [doi]
- TSXProf: Profiling Hardware TransactionsYujie Liu, Justin Gottschlich, Gilles Pokam, Michael F. Spear. 75-86 [doi]
- ALEA: Fine-Grain Energy Profiling with Basic Block SamplingLev Mukhanov, Dimitrios S. Nikolopoulos, Bronis R. de Supinski. 87-98 [doi]
- Towards General-Purpose Neural Network ComputingSchuyler Eldridge, Amos Waterland, Margo Seltzer, Jonathan Appavoo, Ajay Joshi. 99-112 [doi]
- Practical Near-Data Processing for In-Memory Analytics FrameworksMingyu Gao, Grant Ayers, Christos Kozyrakis. 113-124 [doi]
- Scalable Task Scheduling and Synchronization Using Hierarchical EffectsStephen T. Heumann, Alexandros Tzannes, Vikram S. Adve. 125-137 [doi]
- PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator ProgrammingRiyadh Baghdadi, Ulysse Beaugnon, Albert Cohen 0001, Tobias Grosser, Michael Kruse, Chandan Reddy, Sven Verdoolaege, Adam Betts, Alastair F. Donaldson, Jeroen Ketema, Javed Absar, Sven van Haastregt, Alexey Kravets, Anton Lokhmotov, Robert David, Elnar Hajiyev. 138-149 [doi]
- Communication Avoiding Algorithms: Analysis and Code Generation for Parallel SystemsKarthik Murthy, John M. Mellor-Crummey. 150-162 [doi]
- Exploiting Program Semantics to Place Data in Hybrid MemoryWei Wei, Dejun Jiang, Sally A. McKee, Jin Xiong, Mingyu Chen. 163-173 [doi]
- Decoupled Direct Memory Access: Isolating CPU and IO Traffic by Leveraging a Dual-Data-Port DRAMDonghyuk Lee, Lavanya Subramanian, Rachata Ausavarungnirun, Jongmoo Choi, Onur Mutlu. 174-187 [doi]
- A Software-Managed Approach to Die-Stacked DRAMMark Oskin, Gabriel H. Loh. 188-200 [doi]
- An Algorithmic Approach to Communication Reduction in Parallel Graph AlgorithmsHarshvardhan, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger. 201-212 [doi]
- Polyhedral Optimizations of Explicitly Parallel ProgramsPrasanth Chatarasi, Jun Shirako, Vivek Sarkar. 213-226 [doi]
- Tardis: Time Traveling Coherence Algorithm for Distributed Shared MemoryXiangyao Yu, Srinivas Devadas. 227-240 [doi]
- BSSync: Processing Near Memory for Machine Learning Workloads with Bounded Staleness Consistency ModelsJoo Hwan Lee, Jaewoong Sim, Hyesoon Kim. 241-252 [doi]
- Brain-Inspired ComputingDharmendra S. Modha. 253 [doi]
- Runtime Value Numbering: A Profiling Technique to Pinpoint Redundant ComputationsShasha Wen, Xu Liu, Milind Chabbi. 254-265 [doi]
- Tracking and Reducing Uncertainty in Dataflow Analysis-Based Dynamic Parallel MonitoringMichelle L. Goodstein, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry. 266-279 [doi]
- Compiler Assisted Load Balancing on Large ClustersVinit Deodhar, Hrushit Parikh, Ada Gavrilovska, Santosh Pande. 280-291 [doi]
- RC3: Consistency Directed Cache Coherence for x86-64 with RC ExtensionsMarco Elver, Vijay Nagarajan. 292-304 [doi]
- Fine Grain Cache Partitioning Using Per-Instruction Working BlocksJason Jong Kyu Park, Yongjun Park, Scott A. Mahlke. 305-316 [doi]
- An Efficient, Self-Contained, On-chip Directory: DIR1-SISDMahdad Davari, Alberto Ros, Erik Hagersten, Stefanos Kaxiras. 317-330 [doi]
- Dealing with the Unknown: Resilience to Prediction ErrorsSubrata Mitra, Greg Bronevetsky, Suhas Javagal, Saurabh Bagchi. 331-342 [doi]
- Exploiting Staleness for Approximating Loads on CMPsPrasanna Venkatesh Rengasamy, Anand Sivasubramaniam, Mahmut T. Kandemir, Chita R. Das. 343-354 [doi]
- Orchestrating Multiple Data-Parallel Kernels on Multiple DevicesJanghaeng Lee, Mehrzad Samadi, Scott A. Mahlke. 355-366 [doi]
- AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore PerformanceMuneeb Khan, Michael A. Laurenzano, Jason Mars, Erik Hagersten, David Black-Schaffer. 367-378 [doi]
- Runtime-Guided Management of Scratchpad Memories in Multicore ArchitecturesLluc Alvarez, Miquel Moretó, Marc Casas, Emilio Castillo, Xavier Martorell, Jesús Labarta, Eduard Ayguadé, Mateo Valero. 379-391 [doi]
- OSPREY: Implementation of Memory Consistency Models for Cache Coherence Protocols involving Invalidation-Free Data AccessGeorge Kurian, Qingchuan Shi, Srinivas Devadas, Omer Khan. 392-405 [doi]
- Cosmology and Computers: HACCing the UniverseSalman Habib. 406 [doi]
- Vector Parallelism in JavaScript: Language and Compiler Support for SIMDIvan Jibaja, Peter Jensen, Ningxin Hu, Mohammad R. Haghighat, John McCutchan, Dan Gohman, Stephen M. Blackburn, Kathryn S. McKinley. 407-418 [doi]
- Compiling and Optimizing Java 8 Programs for GPU ExecutionKazuaki Ishizaki, Akihiro Hayashi, Gita Koblents, Vivek Sarkar. 419-431 [doi]
- Throttling Automatic Vectorization: When Less is MoreVasileios Porpodas, Timothy M. Jones. 432-444 [doi]
- Evaluating the Cost of Atomic Operations on Modern ArchitecturesHermann Schweizer, Maciej Besta, Torsten Hoefler. 445-456 [doi]
- MeToo: Stochastic Modeling of Memory Traffic Timing BehaviorYipeng Wang, Ganesh Balakrishnan, Yan Solihin. 457-467 [doi]
- Using Compiler Techniques to Improve Automatic Performance ModelingArnamoy Bhattacharyya, Grzegorz Kwasniewski, Torsten Hoefler. 468-479 [doi]
- Using Hybrid Schedules to Safely Outperform Classical Polyhedral SchedulesTian Jin. 480-481 [doi]
- Unified Identification of Multiple Forms of Parallelism in Embedded ApplicationsMiguel Angel Aguilar, Rainer Leupers. 482-483 [doi]
- An Optimization of Resource Arrangement for Network-on-Chip using Genetic AlgorithmDaichi Murakami, Kei Hiraki. 484-485 [doi]
- Load Balancing in Decoupled Look-ahead: A Do-It-Yourself (DIY) ApproachRaj Parihar, Michael C. Huang. 486-487 [doi]
- An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUsShixiong Xu, David Gregg. 488-489 [doi]
- Extending Polyhedral Model for Analysis and Transformation of OpenMP ProgramsPrasanth Chatarasi, Vivek Sarkar. 490-491 [doi]
- Energy-Efficient Hybrid DRAM/NVM Main MemoryAhmad Hassan, Hans Vandierendonck, Dimitrios S. Nikolopoulos. 492-493 [doi]
- DVFS-Aware Consolidation for Energy-Efficient CloudsPatricia Arroba, José Manuel Moya, José L. Ayala, Rajkumar Buyya. 494-495 [doi]
- Integrating 3D Resistive Memory Cache into GPGPU for Energy-Efficient Data ProcessingJie Zhang, David Donofrio, John Shalf, Myoungsoo Jung. 496-497 [doi]
- Storage Consolidation on SSDs: Not Always a Panacea, but Can We Ease the Pain?Narges Shahidi, Anand Sivasubramanian, Mahmut T. Kandemir, Chita R. Das. 498-499 [doi]