Abstract is missing.
- A Unified Scheduler for Recursive and Task Dataflow ParallelismHans Vandierendonck, George Tzenakis, Dimitrios S. Nikolopoulos. 1-11 [doi]
- No More Backstabbing... A Faithful Scheduling Policy for Multithreaded ProgramsKishore Kumar Pusukuri, Rajiv Gupta, Laxmi N. Bhuyan. 12-21 [doi]
- Dynamic Fine-Grain Scheduling of Pipeline ParallelismDaniel Sanchez, David Lo, Richard M. Yoo, Jeremy Sugerman, Christos Kozyrakis. 22-32 [doi]
- SPATL: Honey, I Shrunk the Coherence DirectoryHongzhou Zhao, Arrvindh Shriraman, Sandhya Dwarkadas, Vijayalakshmi Srinivasan. 33-44 [doi]
- POPS: Coherence Protocol Optimization for Both Private and Shared DataHemayet Hossain, Sandhya Dwarkadas, Michael C. Huang. 45-55 [doi]
- An OpenCL Framework for Homogeneous Manycores with No Hardware Cache CoherenceJun Lee, Jungwon Kim, Junghyun Kim, Sangmin Seo, Jaejin Lee. 56-67 [doi]
- Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core LibrariesBin Ren, Gagan Agrawal. 68-77 [doi]
- Efficient Parallel Graph Exploration on Multi-Core CPU and GPUSungpack Hong, Tayo Oguntebi, Kunle Olukotun. 78-88 [doi]
- A Heterogeneous Parallel Framework for Domain-Specific LanguagesKevin J. Brown, Arvind K. Sujeeth, HyoukJoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, Kunle Olukotun. 89-100 [doi]
- PEPSC: A Power-Efficient Processor for Scientific ComputingGanesh S. Dasika, Ankit Sethia, Trevor N. Mudge, Scott A. Mahlke. 101-110 [doi]
- Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core ScalingJungseob Lee, Vijay Sathisha, Michael J. Schulte, Katherine Compton, Nam Sung Kim. 111-120 [doi]
- Performance Per Watt Benefits of Dynamic Core Morphing in Asymmetric MulticoresRance Rodrigues, Arunachalam Annamalai, Israel Koren, Sandip Kundu, Omer Khan. 121-130 [doi]
- Phase-Based Application-Driven Hierarchical Power Management on the Single-chip Cloud ComputerNikolas Ioannou, Michael Kauschke, Matthias Gries, Marcelo Cintra. 131-142 [doi]
- Optimizing Data Layouts for Parallel Computation on MulticoresYuanrui Zhang, Wei Ding, Jun Liu, Mahmut T. Kandemir. 143-154 [doi]
- DeNovo: Rethinking the Memory Hierarchy for Disciplined ParallelismByn Choi, Rakesh Komuravelli, Hyojin Sung, Robert Smolinski, Nima Honarmand, Sarita V. Adve, Vikram S. Adve, Nicholas P. Carter, Ching-Tsun Chou. 155-166 [doi]
- A Hierarchical Approach to Maximizing MapReduce EfficiencyZhiwei Xiao, Haibo Chen, Binyu Zang. 167-168 [doi]
- Building Retargetable and Efficient Compilers for Multimedia Instruction SetsSerge Guelton, Adrien Guinet, Ronan Keryell. 169-170 [doi]
- Compiler Directed Data Locality Optimization for Multicore ArchitecturesWei Ding, Jithendra Srinivas, Mahmut T. Kandemir, Mustafa Karaköy. 171-172 [doi]
- CriticalFault: Amplifying Soft Error Effect Using Vulnerability-Driven InjectionXin Xu, Man-Lap Li. 173-174 [doi]
- Understanding the Behavior of Pthread Applications on Non-Uniform Cache ArchitecturesGagandeep S. Sachdev, Kshitij Sudan, Mary W. Hall, Rajeev Balasubramonian. 175-176 [doi]
- Exploiting Mutual Awareness between Prefetchers and On-chip Networks in Multi-coresJunghoon Lee, Minjeong Shin, Hanjoon Kim, John Kim, Jaehyuk Huh. 177-178 [doi]
- Decoupled Architectures as a Low-Complexity Alternative to Out-of-order ExecutionNeal Clayton Crago, Sanjay J. Patel. 179-180 [doi]
- Parameterized Micro-benchmarking: An Auto-tuning Approach for Complex ApplicationsWenjing Ma, Sriram Krishnamoorthy, Gagan Agrawal. 181-182 [doi]
- Prediction Based DRAM Row-Buffer Management in the Many-Core EraManu Awasthi, David W. Nellans, Rajeev Balasubramonian, Al Davis. 183-184 [doi]
- Program InterferometryZhe Wang, Daniel A. Jiménez. 185-186 [doi]
- Regulating Locality vs. Parallelism Tradeoffs in Multiple Memory Controller EnvironmentsSyed Minhaj Hassan, Dhruv Choudhary, Mitchelle Rasquinha, Sudhakar Yalamanchili. 187-188 [doi]
- Row-Buffer Reorganization: Simultaneously Improving Performance and Reducing Energy in DRAMsNagendra Dwarakanath Gulur, R. Manikantan, R. Govindarajan, Mahesh Mehendale. 189-190 [doi]
- Scalable Proximity-Aware Cache Replication in Chip MultiprocessorsChongmin Li, Haixia Wang, Yibo Xue, Dongsheng Wang, Jian Li. 191-192 [doi]
- Scalable and Efficient Bounds Checking for Large-Scale CMP EnvironmentsBaik Song An, Ki Hwan Yum, Eun Jung Kim 0001. 193-194 [doi]
- An Alternative Memory Access Scheduling in Manycore AcceleratorsYonggon Kim, Hyunseok Lee, John Kim. 195-196 [doi]
- Beforehand Migration on D-NUCA CachesJavier Lira, Timothy M. Jones, Carlos Molina, Antonio González. 197-198 [doi]
- SymptomTM: Symptom-Based Error Detection and Recovery Using Hardware Transactional MemoryGulay Yalcin, Osman S. Unsal, Adrián Cristal, Ibrahim Hur, Mateo Valero. 199-200 [doi]
- rPRAM: Exploring Redundancy Techniques to Improve Lifetime of PCM-based Main MemoryJie Chen 0020, Zachary Winter, Guru Venkataramani, H. Howie Huang. 201-202 [doi]
- Pi-TM: Pessimistic Invalidation for Scalable Lazy Hardware Transactional MemoryAnurag Negi, Per Stenström, J. Rubén Titos Gil, Manuel E. Acacio, José M. García. 203-204 [doi]
- MCFQ: Leveraging Memory-level Parallelism and Application's Cache Friendliness for Efficient Management of Quasi-partitioned Last-level CachesDimitris Kaseridis, Muhammad Faisal Iqbal, Jeffrey Stuecheli, Lizy Kurian John. 205-206 [doi]
- MRAC: A Memristor-based Reconfigurable Framework for Adaptive Cache ReplacementPing Zhou, Bo Zhao, Youtao Zhang, Jun Yang 0002, Yiran Chen. 207-208 [doi]
- Sampling Temporal Touch Hint (STTH) Inclusive Cache Management PolicyYingying Tian, Daniel A. Jiménez. 209 [doi]
- Exploiting Rank Idle Time for Scheduling Last-Level Cache WritebackZhe Wang, Daniel A. Jiménez. 210 [doi]
- TIDeFlow: A Parallel Execution Model for High Performance Computing ProgramsDaniel A. Orozco. 211 [doi]
- Decoupled Cache Segmentation: Mutable Policy with Automated BypassSamira Manabi Khan, Daniel A. Jiménez. 212 [doi]
- A Software-Managed Coherent Memory Architecture for ManycoresJung-Ho Park, Choonki Jang, Jaejin Lee. 213 [doi]
- Improving Last-Level Cache Performance by Exploiting the Concept of MRU-TourAlejandro Valero, Julio Sahuquillo, Salvador Petit, Pedro López, José Duato. 214 [doi]
- A Compiler-assisted Runtime-prefetching Scheme for Heterogenous PlatformsBaojiang Shou, Xionghui Hou, Li Chen. 215 [doi]
- Improving Run-Time Scheduling for General-Purpose Parallel CodeAlexandros Tzannes, Rajeev Barua, Uzi Vishkin. 216 [doi]
- Collaborative Caching for Unknown Cache SizesXiaoming Gu. 217 [doi]
- Programming Strategies for GPUs and their Power ConsumptionSayan Ghosh, Barbara M. Chapman. 218 [doi]
- An Architecture to Enable Lifetime Full Chip Testability in Chip MultiprocessorsRance Rodrigues, Israel Koren, Sandip Kundu. 219 [doi]
- Probabilistic Models Towards Optimal Speculation of DFA ApplicationsZhijia Zhao, Bo Wu. 220 [doi]
- STM2: A Parallel STM for High Performance Simultaneous Multithreading SystemsGokcen Kestor, Roberto Gioiosa, Tim Harris, Osman S. Unsal, Adrián Cristal, Ibrahim Hur, Mateo Valero. 221-231 [doi]
- Making STMs Cache Friendly with Compiler TransformationsSandya Mannarswamy, Ramaswamy Govindarajan. 232-242 [doi]
- Enhancing Data Locality for Dynamic Simulations through Asynchronous Data Transformations and Adaptive ControlBo Wu, Eddy Z. Zhang, Xipeng Shen. 243-252 [doi]
- SFMalloc: A Lock-Free and Mostly Synchronization-Free Dynamic Memory Allocator for ManycoresSangmin Seo, Junghyun Kim, Jaejin Lee. 253-263 [doi]
- Coherent Profiles: Enabling Efficient Reuse Distance Analysis of Multicore Scaling for Loop-based Parallel ProgramsMeng-Ju Wu, Donald Yeung. 264-275 [doi]
- StVEC: A Vector Instruction Extension for High Performance Stencil ComputationNaser Sedaghati, Renji Thomas, Louis-Noël Pouchet, Radu Teodorescu, P. Sadayappan. 276-287 [doi]
- OpenMDSP: Extending OpenMP to Program Multi-Core DSPJiangzhou He, Wenguang Chen, Guangri Chen, Weimin Zheng, Zhizhong Tang, Handong Ye. 288-297 [doi]
- ARIADNE: Agnostic Reconfiguration in a Disconnected Network EnvironmentKonstantinos Aisopos, Andrew DeOrio, Li-Shiuan Peh, Valeria Bertacco. 298-309 [doi]
- Correctly Treating Synchronizations in Compiling Fine-Grained SPMD-Threaded Programs for CPUZiyu Guo, Eddy Zheng Zhang, Xipeng Shen. 310-319 [doi]
- Divergence Analysis and OptimizationsBruno Coutinho, Diogo Sampaio, Fernando Magno Quintão Pereira, Wagner Meira Jr.. 320-329 [doi]
- Large Scale Verification of MPI Programs Using Lamport Clocks with Lazy UpdateAnh Vo, Ganesh Gopalakrishnan, Robert M. Kirby, Bronis R. de Supinski, Martin Schulz, Greg Bronevetsky. 330-339 [doi]
- DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB DirectoryCarlos Villavieja, Vasileios Karakostas, Lluís Vilanova, Yoav Etsion, Alex Ramírez, Avi Mendelson, Nacho Navarro, Adrián Cristal, Osman S. Unsal. 340-349 [doi]
- Linear-time Modeling of Program Working Set in Shared CacheXiaoya Xiang, Bin Bao, Chen Ding, Yaoqing Gao. 350-360 [doi]
- Using a Reconfigurable L1 Data Cache for Efficient Version Management in Hardware Transactional MemoryAdrià Armejach, Azam Seyedi, J. Rubén Titos Gil, Ibrahim Hur, Adrián Cristal, Osman S. Unsal, Mateo Valero. 361-371 [doi]
- An Evaluation of Vectorizing CompilersSaeed Maleki, Yaoqing Gao, María Jesús Garzarán, Tommy Wong, David A. Padua. 372-382 [doi]
- Modeling and Performance Evaluation of TSO-Preserving Binary OptimizationCheng Wang, Youfeng Wu. 383-392 [doi]
- Exploiting Task Order Information for Optimizing Sequentially Consistent Java ProgramsChristoph Angerer, Thomas R. Gross. 393-402 [doi]
- Memory Architecture for Integrating Emerging Memory TechnologiesKun Fang, Long Chen, Zhao Zhang, Zhichun Zhu. 403-412 [doi]
- Speculative Parallelization in Decoupled Look-aheadAlok Garg, Raj Parihar, Michael C. Huang. 413-423 [doi]
- Optimizing Regular Expression Matching with SR-NFA on Multi-Core SystemsYi-Hua E. Yang, Viktor K. Prasanna. 424-433 [doi]