Abstract is missing.
- Big Data Analytics on Flash Storage with AcceleratorsArvind. 1 [doi]
- Combating the Reliability Challenge of GPU Register File at Low Supply VoltageJingweijia Tan, Shuaiwen Leon Song, Kaige Yan, Xin Fu, Andrès Márquez, Darren J. Kerbyson. 3-15 [doi]
- μC-States: Fine-grained GPU Datapath Power ManagementOnur Kayiran, Adwait Jog, Ashutosh Pattnaik, Rachata Ausavarungnirun, Xulong Tang, Mahmut T. Kandemir, Gabriel H. Loh, Onur Mutlu, Chita R. Das. 17-30 [doi]
- Scheduling Techniques for GPU Architectures with Processing-In-Memory CapabilitiesAshutosh Pattnaik, Xulong Tang, Adwait Jog, Onur Kayiran, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, Chita R. Das. 31-44 [doi]
- OAWS: Memory Occlusion Aware Warp SchedulingBin Wang, Yue Zhu, Weikuan Yu. 45-55 [doi]
- Integrating Algorithmic Parameters into Benchmarking and Design Space Exploration in 3D Scene UnderstandingBruno Bodin, Luigi Nardi, M. Zeeshan Zia, Harry Wagstaff, Govind Sreekar Shenoy, Murali Krishna Emani, John Mawer, Christos Kotselidis, Andy Nisbet, Mikel Luján, Björn Franke, Paul H. J. Kelly, Michael F. P. O'Boyle. 57-69 [doi]
- Fusion of Parallel Array OperationsMads Ruben Burgdorff Kristensen, Simon Andreas Frimann Lund, Troels Blum, James Avery. 71-85 [doi]
- Reduction Drawing: Language Constructs and Polyhedral Compilation for Reductions on GPUChandan Reddy, Michael Kruse, Albert Cohen 0001. 87-97 [doi]
- Resource Conscious Reuse-Driven Tiling for GPUsPrashant Singh Rawat, Changwan Hong, Mahesh Ravishankar, Vinod Grover, Louis-Noël Pouchet, Atanas Rountev, P. Sadayappan. 99-111 [doi]
- Accelerating Linked-list Traversal Through Near-Data ProcessingByungchul Hong, Gwangsun Kim, Jung Ho Ahn, Yongkee Kwon, Hongsik Kim, John Kim. 113-124 [doi]
- Scalable Task Parallelism for NUMA: A Uniform Abstraction for Coordinated Scheduling and Memory ManagementAndi Drebes, Antoniu Pop, Karine Heydemann, Albert Cohen 0001, Nathalie Drach. 125-137 [doi]
- A Static Cut-off for Task Parallel ProgramsShintaro Iwasaki, Kenjiro Taura. 139-150 [doi]
- Greater Performance and Better Efficiency: Predicated Execution has shown us the wayYale N. Patt. 151 [doi]
- WearCore: A Core for Wearable WorkloadsSanyam Mehta, Josep Torrellas. 153-164 [doi]
- Energy Aware Persistence: Reducing Energy Overheads of Memory-based Persistence in NVMsSudarsun Kannan, Moinuddin K. Qureshi, Ada Gavrilovska, Karsten Schwan. 165-177 [doi]
- Power Tuning HPC Jobs on Power-Constrained SystemsNeha Gholkar, Frank Mueller, Barry Rountree. 179-191 [doi]
- Online Scalability Characterization of Data-Parallel Programs on Many CoresYounghyun Cho, Surim Oh, Bernhard Egger. 191-205 [doi]
- Speculatively Exploiting Cross-Invocation ParallelismJialu Huang, Prakash Prabhu, Thomas B. Jablin, Soumyadeep Ghosh, Sotiris Apostolakis, Jae W. Lee, David I. August. 207-221 [doi]
- MicroSpec: Speculation-Centric Fine-Grained Parallelization for FSM ComputationsJunqiao Qiu, Zhijia Zhao, Bin Ren. 221-233 [doi]
- Hash Map InliningDibakar Gope, Mikko H. Lipasti. 235-246 [doi]
- Sparso: Context-driven Optimizations of Sparse Linear AlgebraHongbo Rong, JongSoo Park, Lingxiang Xiang, Todd A. Anderson, Mikhail Smelyanskiy. 247-259 [doi]
- Tardis 2.0: Optimized Time Traveling Coherence for Relaxed Consistency ModelsXiangyao Yu, Hongzhe Liu, Ethan Zou, Srinivas Devadas. 261-274 [doi]
- Reducing Cache Coherence Traffic with Hierarchical Directory Cache and NUMA-Aware Runtime SchedulingPaul Caheny, Marc Casas, Miquel Moretó, Hervé Gloaguen, Maxime Saintes, Eduard Ayguadé, Jesús Labarta, Mateo Valero. 275-286 [doi]
- Characterizing and Optimizing the Performance of Multithreaded Programs Under InterferenceYong Zhao, Jia Rao, Qing Yi. 287-297 [doi]
- Optimizing Indirect Memory References with milkVladimir Kiriansky, Yunming Zhang, Saman P. Amarasinghe. 299-312 [doi]
- Scaling Data Analytics with Moore's LawKunle Olukotun. 313 [doi]
- Bridging the Semantic Gaps of GPU Acceleration for Scale-out CNN-based Big Data Processing: Think Big, See SmallMingcong Song, Yang Hu, Yunlong Xu, Chao Li, Huixiang Chen, Jingling Yuan, Tao Li. 315-326 [doi]
- A DSL Compiler for Accelerating Image Processing Pipelines on FPGAsNitin Chugh, Vinay Vasista, Suresh Purini, Uday Bondhugula. 327-338 [doi]
- Automatically Exploiting Implicit Pipeline Parallelism from Multiple Dependent Kernels for GPUsGwangsun Kim, Jiyun Jeong, John Kim, Mark Stephenson. 341-352 [doi]
- CAF: Core to Core Communication Acceleration FrameworkYipeng Wang, Ren Wang, Andrew Herdrich, James Tsai, Yan Solihin. 351-362 [doi]
- Vectorization of Multibyte Floating Point Data FormatsAndrew Anderson, David Gregg. 363-372 [doi]
- Rinnegan: Efficient Resource Use in Heterogeneous ArchitecturesSankaralingam Panneerselvam, Michael M. Swift. 373-386 [doi]
- Auto-tuning Spark Big Data Workloads on POWER8: Prediction-Based Dynamic SMT ThreadingZhen Jia, Chao Xue, Guancheng Chen, Jianfeng Zhan, Lixin Zhang, Yonghua Lin, Peter Hofstee. 387-400 [doi]
- EXCITE-VM: Extending the Virtual Memory System to Support Snapshot Isolation TransactionsHeiner Litz, Benjamin Braun, David R. Cheriton. 401-412 [doi]
- POSTER: Fly-Over: A Light-Weight Distributed Power-Gating Mechanism For Energy-Efficient Networks-on-ChipRahul Boyapati, Jiayi Huang, Ningyuan Wang, Kyung-Hoon Kim, Ki Hwan Yum, Eun Jung Kim 0001. 413-414 [doi]
- POSTER: Exploiting Asymmetric Multi-Core Processors with Flexible System SofwareKallia Chronaki, Miquel Moretó, Marc Casas, Alejandro Rico, Rosa M. Badia, Eduard Ayguadé, Jesús Labarta, Mateo Valero. 415-417 [doi]
- POSTER: Easy PRAM-based High-Performance Parallel Programming with ICEFady Ghanim, Rajeev Barua, Uzi Vishkin. 419-420 [doi]
- POSTER: Fault-tolerant Execution on COTS Multi-core Processors with Hardware Transactional Memory SupportFlorian Haas, Sebastian Weis, Theo Ungerer, Gilles Pokam, Youfeng Wu. 421-422 [doi]
- POSTER: Collective Dynamic Parallelism for Directive Based GPU Programming Languages and CompilersGuray Ozen, Eduard Ayguadé, Jesús Labarta. 423-424 [doi]
- POSTER: Firestorm: Operating Systems for Power-Constrained ArchitecturesSankaralingam Panneerselvam, Michael M. Swift. 425-427 [doi]
- POSTER: ξ-TAO: A Cache-centric Execution Model and Runtime for Deep Parallel Multicore TopologiesMiquel Pericàs. 429-431 [doi]
- POSTER: Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed SemanticsAlberto Ros, Carl Leonardsson, Christos Sakalis, Stefanos Kaxiras. 433-434 [doi]
- POSTER: SILC-FM: Subblocked InterLeaved Cache-Like Flat Memory OrganizationJee Ho Ryoo, Mitesh R. Meswani, Reena Panda, Lizy K. John. 435-437 [doi]
- POSTER: Hybrid Data Dependence Analysis for Loop TransformationsDiogo Nunes Sampaio, Alain Ketterlin, Louis-Noël Pouchet, Fabrice Rastello. 439-440 [doi]
- POSTER: An Optimization of Dataflow Architectures for Scientific ApplicationsXiaowei Shen, Xiaochun Ye, Xu Tan, Da Wang, Zhimin Zhang, Dongrui Fan, Zhimin Tang. 441-442 [doi]
- POSTER: hVISC: A Portable Abstraction for Heterogeneous Parallel SystemsPrakalp Srivastava, Maria Kotsifakou, Matthew D. Sinclair, Rakesh Komuravelli, Vikram S. Adve, Sarita V. Adve. 443-445 [doi]
- POSTER: An Integrated Vector-Scalar Design on an In-order ARM CoreMilan Stanic, Oscar Palomar, Timothy Hayes 0001, Ivan Ratkovic, Osman S. Unsal, Adrián Cristal, Mateo Valero. 447-448 [doi]
- POSTER: Pagoda: A Runtime System to Maximize GPU Utilization in Data Parallel Tasks with Limited ParallelismTsung Tai Yeh, Amit Sabne, Putt Sakdhnagool, Rudolf Eigenmann, Timothy G. Rogers. 449-450 [doi]
- Student Research Poster: Slack-Aware Shared Bandwidth Management in GPUsSaumay Dublish. 451-452 [doi]
- Student Research Poster: From Processing-in-Memory to Processing-in-StorageRoman Kaplan. 453 [doi]
- Student Research Poster: Network Controller Emulation on a Sidecore for Unmodified Virtual MachinesArthur Kiyanovski. 454 [doi]
- Student Research Poster: A Low Complexity Cache Sharing Mechanism to Address System FairnessVicent Selfa, Julio Sahuquillo, Salvador Petit, María Engracia Gómez. 455 [doi]
- Student Research Poster: A Scalable General Purpose System for Large-Scale Graph ProcessingJiawen Sun. 456 [doi]
- Student Research Poster: Compiling Boolean Circuits to Non-deterministic Branching Programs to be Implemented by Light Switching CircuitsVladislav Tartakovsky. 457 [doi]
- Student Research Poster: Software Out-of-Order Execution for In-Order ArchitecturesKim-Anh Tran. 458 [doi]