Abstract is missing.
- Message from the program chairsAndré Seznec, François Bodin. [doi]
- General chairs' welcome messageMichael O'Boyle, Christian Fensch. [doi]
- Keynote talk: A comprehensive approach to HW/SW codesignDavid J. Kuck. 1 [doi]
- Keynote talk: Parallel programming for mobile computingCalin Cascaval. 3 [doi]
- Keynote talk: Towards automatic resource management in parallel architecturesPer Stenström. 5 [doi]
- INSPIRE: The insieme parallel intermediate representationHerbert Jordan, Simone Pellegrini, Peter Thoman, Klaus Kofler, Thomas Fahringer. 7-17 [doi]
- Parallel flow-sensitive pointer analysis by graph-rewritingVaivaswatha Nagaraj, R. Govindarajan. 19-28 [doi]
- Interprocedural strength reduction of critical sections in explicitly-parallel programsRajkishore Barik, Jisheng Zhao, Vivek Sarkar. 29-40 [doi]
- ThermOS: System support for dynamic thermal management of chip multi-processorsFilippo Sironi, Martina Maggio, Riccardo Cattaneo, Giovanni F. Del Nero, Donatella Sciuto, Marco D. Santambrogio. 41-50 [doi]
- Coordinated power-performance optimization in manycoresHiroshi Sasaki, Satoshi Imamura, Koji Inoue. 51-61 [doi]
- An opportunistic prediction-based thread scheduling to maximize throughput/watt in AMPsArunachalam Annamalai, Rance Rodrigues, Israel Koren, Sandip Kundu. 63-72 [doi]
- APOGEE: Adaptive prefetching on GPUs for energy efficiencyAnkit Sethia, Ganesh S. Dasika, Mehrzad Samadi, Scott A. Mahlke. 73-82 [doi]
- Parallel frame rendering: Trading responsiveness for energy on a mobile GPUJose-Maria Arnau, Joan-Manuel Parcerisa, Polychronis Xekalakis. 83-92 [doi]
- Exploring hybrid memory for GPU energy efficiency through software-hardware co-designBin Wang, Bo Wu, Dong Li, Xipeng Shen, Weikuan Yu, Yizheng Jiao, Jeffrey S. Vetter. 93-102 [doi]
- S-CAVE: Effective SSD caching to improve virtual machine storage performanceTian Luo, Siyuan Ma, Rubao Lee, Xiaodong Zhang 0001, Deng Liu, Li Zhou. 103-112 [doi]
- Writeback-aware bandwidth partitioning for multi-core systems with PCMMiao Zhou, Yu Du, Bruce R. Childers, Rami G. Melhem, Daniel Mossé. 113-122 [doi]
- L1-bandwidth aware thread allocation in multicore SMT processorsJosué Feliu, Julio Sahuquillo, Salvador Petit, José Duato. 123-132 [doi]
- A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessorsSandeep Navada, Niket K. Choudhary, Salil V. Wadhavkar, Eric Rotenberg. 133-144 [doi]
- Memory-centric system interconnect design with Hybrid Memory CubesGwangsun Kim, John Kim, Jung Ho Ahn, Jaeha Kim. 145-155 [doi]
- Neither more nor less: Optimizing thread-level parallelism for GPGPUsOnur Kayiran, Adwait Jog, Mahmut T. Kandemir, Chita R. Das. 157-166 [doi]
- SMT-centric power-aware thread placement in chip multiprocessorsAugusto Vega, Alper Buyuktosunoglu, Pradip Bose. 167-176 [doi]
- Fairness-aware scheduling on single-ISA heterogeneous multi-coresKenzo Van Craeynest, Shoaib Akram, Wim Heirman, Aamer Jaleel, Lieven Eeckhout. 177-187 [doi]
- DANBI: Dynamic scheduling of irregular stream programs for many-core systemsChangwoo Min, Young Ik Eom. 189-200 [doi]
- An empirical model for predicting cross-core performance interference on multicore processorsJiacheng Zhao, Xiaobing Feng, Huimin Cui, Youliang Yan, Jingling Xue, Wensen Yang. 201-212 [doi]
- Jigsaw: Scalable software-defined cachesNathan Beckmann, Daniel Sanchez. 213-224 [doi]
- Managing shared last-level cache in a heterogeneous multicore processorVineeth Mekkat, Anup Holey, Pen-Chung Yew, Antonia Zhai. 225-234 [doi]
- Reshaping cache misses to improve row-buffer locality in multicore systemsWei Ding, Jun Liu, Mahmut T. Kandemir, Mary Jane Irwin. 235-244 [doi]
- Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systemsJanghaeng Lee, Mehrzad Samadi, Yongjun Park, Scott A. Mahlke. 245-255 [doi]
- Starchart: Hardware and software optimization using recursive partitioning regression treesWenhao Jia, Kelly A. Shaw, Margaret Martonosi. 257-267 [doi]
- RSVM: A Region-based Software Virtual Memory for GPUFeng Ji, Heshan Lin, Xiaosong Ma. 269-278 [doi]
- The case for a scalable coherence protocol for complex on-chip cache hierarchies in many-core systemsLucia G. Menezo, Valentin Puente, José-Ángel Gregorio. 279-288 [doi]
- Meeting midway: Improving CMP performance with memory-side prefetchingPraveen Yedlapalli, Jagadish Kotra, Emre Kultursay, Mahmut T. Kandemir, Chita R. Das, Anand Sivasubramaniam. 289-298 [doi]
- Building expressive, area-efficient coherence directoriesLei Fang, Peng Liu, Qi Hu, Michael C. Huang, Guofan Jiang. 299-308 [doi]
- Traffic steering between a low-latency unswitched TL ring and a high-throughput switched on-chip interconnectJungju Oh, Alenka G. Zajic, Milos Prvulovic. 309-318 [doi]
- McRouter: Multicast within a router for high performance network-on-chipsYuan He, Hiroshi Sasaki, Shinobu Miwa, Hiroshi Nakamura. 319-329 [doi]
- Concurrent predicates: A debugging technique for every parallel programmerJustin Emile Gottschlich, Gilles Pokam, Cristiano Pereira, Youfeng Wu. 331-340 [doi]
- Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDGVenkatraman Govindaraju, Tony Nowatzki, Karthikeyan Sankaralingam. 341-351 [doi]
- Vectorization past dependent branches through speculationMajedul Haque Sujon, R. Clint Whaley, Qing Yi. 353-362 [doi]
- Automatic vectorization of tree traversalsYoungjoon Jo, Michael Goldfarb, Milind Kulkarni. 363-374 [doi]
- Generating efficient data movement code for heterogeneous architectures with distributed-memoryRoshan Dathathri, Chandan Reddy, Thejas Ramashekar, Uday Bondhugula. 375-386 [doi]
- Automatic OpenCL work-group size selection for multicore CPUsSangmin Seo, Jun Lee, Gangwon Jo, Jaejin Lee. 387-397 [doi]
- TCPT - Thread criticality-driven prefetcher throttlingBiswabandan Panda, Shankar Balachandran. 399 [doi]
- Do inputs matter? using data-dependence profiling to evaluate thread level speculation in BG/QArnamoy Bhattacharyya. 401 [doi]
- Can lock-free and combining techniques co-exist? A novel approach on concurrent queueChangwoo Min, Young Ik Eom. 403 [doi]
- Task sampling: Computer architecture simulation in the many-core eraThomas Grass. 405 [doi]
- PS-cache: An energy-efficient cache design for chip multiprocessorsJoan J. Valls, Alberto Ros, Julio Sahuquillo, María Engracia Gómez. 407 [doi]
- Dynamic memory access monitoring based on tagged memoryMikhail Gorelov, Lev Mukhanov. 409 [doi]
- Exposing ILP in custom hardware with a dataflow compiler IRAli Mustafa Zaidi. 411 [doi]