Abstract is missing.
- The changing role of supercomputingPeter J. Ungaro. 1-2 [doi]
- Power-aware multi-core simulation for early design stage hardware/software co-optimizationWim Heirman, Souradip Sarkar, Trevor E. Carlson, Ibrahim Hur, Lieven Eeckhout. 3-12 [doi]
- PGCapping: exploiting power gating for power capping and core lifetime balancing in CMPsKai Ma, Xiaorui Wang. 13-22 [doi]
- Power-efficient time-sensitive mapping in heterogeneous systemsCong Liu, Jian Li, Wei Huang, Juan Rubio, Evan Speight, Xiaozhu Lin. 23-32 [doi]
- Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence schemeSreepathi Pai, R. Govindarajan, Matthew J. Thazhuthaveetil. 33-42 [doi]
- Riposte: a trace-driven compiler and parallel VM for vector code in RJustin Talbot, Zachary Devito, Pat Hanrahan. 43-52 [doi]
- Auto-parallelizing stateful distributed streaming applicationsScott Schneider, Martin Hirzel, Bugra Gedik, Kun-Lung Wu. 53-64 [doi]
- PEPON: performance-aware hierarchical power budgeting for NoC based multicoresAkbar Sharifi, Asit K. Mishra, Shekhar Srikantaiah, Mahmut T. Kandemir, Chita R. Das. 65-74 [doi]
- XPoint cache: scaling existing bus-based coherence protocols for 2D and 3D many-core systemsRonald G. Dreslinski, Thomas Manville, Korey Sewell, Reetuparna Das, Nathaniel Ross Pinckney, Sudhir Satpathy, David Blaauw, Dennis Sylvester, Trevor N. Mudge. 75-86 [doi]
- APCR: an adaptive physical channel regulator for on-chip interconnectsLei Wang, Poornachandran Kumar, Ki Hwan Yum, Eun Jung Kim 0001. 87-96 [doi]
- Pointy: a hybrid pointer prefetcher for managed runtime systemsIoana Burcea, Livio Soares, Andreas Moshovos. 97-106 [doi]
- Scalability-based manycore partitioningHiroshi Sasaki, Teruo Tanimoto, Koji Inoue, Hiroshi Nakamura. 107-116 [doi]
- Optimizing datacenter power with memory system levers for guaranteed quality-of-serviceKshitij Sudan, Sadagopan Srinivasan, Rajeev Balasubramonian, Ravi Iyer. 117-126 [doi]
- Evaluation of blue Gene/Q hardware support for transactional memoriesAmy Wang, Matthew Gaudet, Peng Wu, José Nelson Amaral, Martin Ohmacht, Christopher Barton, Raúl Silvera, Maged M. Michael. 127-136 [doi]
- Making data prefetch smarter: adaptive prefetching on POWER7Victor Jiménez, Roberto Gioiosa, Francisco J. Cazorla, Alper Buyuktosunoglu, Pradip Bose, Francis P. O'Connell. 137-146 [doi]
- Enhancing performance optimization of multicore chips and multichip nodes with data structure metricsAshay Rane, James Browne. 147-156 [doi]
- Compiling to avoid communicationKathy Yelick. 157-158 [doi]
- Visualizing transactional memoryJustin Emile Gottschlich, Maurice Herlihy, Gilles Pokam, Jeremy G. Siek. 159-170 [doi]
- Sandboxing transactional memoryLuke Dalessandro, Michael L. Scott. 171-180 [doi]
- Transactional prefetching: narrowing the window of contention in hardware transactional memoryAnurag Negi, Adrià Armejach, Adrián Cristal, Osman S. Unsal, Per Stenström. 181-190 [doi]
- RISE: improving the streaming processors reliability against soft errors in gpgpusJingweijia Tan, Xin Fu. 191-200 [doi]
- Chrysalis analysis: incorporating synchronization arcs in dataflow-analysis-based parallel monitoringMichelle L. Goodstein, Shimin Chen, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry. 201-212 [doi]
- Probabilistic diagnosis of performance faults in large-scale parallel applicationsIgnacio Laguna, Dong H. Ahn, Bronis R. de Supinski, Saurabh Bagchi, Todd Gamblin. 213-222 [doi]
- Top500 versus sustained performance: the top problems with the top500 list - and what to do about themWilliam T. C. Kramer. 223-230 [doi]
- Practically private: enabling high performance CMPs through compiler-assisted data classificationYong Li 0009, Rami G. Melhem, Alex K. Jones. 231-240 [doi]
- Complexity-effective multicore coherenceAlberto Ros, Stefanos Kaxiras. 241-252 [doi]
- HaLock: hardware-assisted lock contention detection in multithreaded applicationsYongbing Huang, Zehan Cui, Licheng Chen, Wenli Zhang, Yungang Bao, Mingyu Chen. 253-262 [doi]
- Runtime detection and optimization of collective communication patternsTorsten Hoefler, Timo Schneider. 263-272 [doi]
- Coalition threading: combining traditional andnon-traditional parallelism to maximize scalabilityMd Kamruzzaman, Steven Swanson, Dean M. Tullsen. 273-282 [doi]
- Shared memory multiplexing: a novel way to improve GPGPU throughputYi Yang, Ping Xiang, Mike Mantor, Norm Rubin, Huiyang Zhou. 283-292 [doi]
- Introducing hierarchy-awareness in replacement and bypass algorithms for last-level cachesMainak Chaudhuri, Jayesh Gaur, Nithiyanandan Bashyam, Sreenivas Subramoney, Joseph Nuzman. 293-304 [doi]
- Efficient techniques for predicting cache sharing and throughputAndreas Sandberg, David Black-Schaffer, Erik Hagersten. 305-314 [doi]
- Optimal bypass monitor for high performance last-level cachesLingda Li, Dong Tong, Zichao Xie, Junlin Lu, Xu Cheng. 315-324 [doi]
- Lossless and lossy memory I/O link compression for improving performance of GPGPU workloadsVijay Sathisha, Michael J. Schulte, Nam Sung Kim. 325-334 [doi]
- Multi2Sim: a simulation framework for CPU-GPU computingRafael Ubal, Byunghyun Jang, Perhaad Mistry, Dana Schaa, David R. Kaeli. 335-344 [doi]
- A yoke of oxen and a thousand chickens for heavy lifting graph processingAbdullah Gharaibeh, Lauro Beltrão Costa, Elizeu Santos-Neto, Matei Ripeanu. 345-354 [doi]
- The evicted-address filter: a unified mechanism to address both cache pollution and thrashingVivek Seshadri, Onur Mutlu, Michael A. Kozuch, Todd C. Mowry. 355-366 [doi]
- A software memory partition approach for eliminating bank-level interference in multicore systemsLei Liu, Zehan Cui, Mingjie Xing, Yungang Bao, Mingyu Chen, Chengyong Wu. 367-376 [doi]
- Base-delta-immediate compression: practical data compression for on-chip cachesGennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry. 377-388 [doi]
- Hardware acceleration in the IBM PowerEN processor: architecture and performanceAnil Krishna, Timothy Heil, Nicholas Lindberg, Farnaz Toussi, Steven VanderWiel. 389-400 [doi]
- Workload and power budget partitioning for single-chip heterogeneous processorsHao Wang, Vijay Sathish, Ripudaman Singh, Michael J. Schulte, Nam Sung Kim. 401-410 [doi]
- Database analytics acceleration using FPGAsBharat Sukhwani, Hong Min, Mathew Thoennes, Parijat Dube, Balakrishna Iyer, Bernard Brezzo, Donna Dillenberger, Sameh W. Asaad. 411-420 [doi]
- LumiNOC: a power-efficient, high-performance, photonic network-on-chip for future parallel architecturesCheng Li, Mark Browning, Paul V. Gratz, Samuel Palermo. 421-422 [doi]
- Acceleration of bulk memory operations in a heterogeneous multicore architectureJong Hyuk Lee, Ziyi Liu, Xiaonan Tian, Dong Hyuk Woo, Weidong Shi, Dainis Boumber, YongHong Yan, Kyeong-An Kwon. 423-424 [doi]
- Integrating nanophotonics in GPU microarchitectureNilanjan Goswami, Zhongqi Li, Ajit Verma, Ramkumar Shankar, Tao Li. 425-426 [doi]
- Branch and data herding: reducing control and memory divergence for error-tolerant GPU applicationsJohn Sartori, Rakesh Kumar. 427-428 [doi]
- Layout-oblivious optimization for matrix computationsHuimin Cui, Qing Yi, Jingling Xue, Xiaobing Feng 0002. 429-430 [doi]
- Boost.SIMD: generic programming for portable SIMDizationPierre Esterie, Mathias Gaunard, Joel Falcou, Jean-Thierry Lapresté, Brigitte Rozoy. 431-432 [doi]
- Speculative parallelization needs rigor: probabilistic analysis for optimal speculation of finite-state machine applicationsZhijia Zhao, Bo Wu, Xipeng Shen. 433-434 [doi]
- Supporting stateful tasks in a dataflow graphVladimir Gajinov, Srdjan Stipic, Osman S. Unsal, Tim Harris, Eduard Ayguadé, Adrián Cristal. 435-436 [doi]
- MaSiF: machine learning guided auto-tuning of parallel skeletonsAlexander Collins, Christian Fensch, Hugh Leather. 437-438 [doi]
- TMNOC: a case of HTM and NoC co-design for increased energy efficiency and concurrencyLihang Zhao, Woojin Choi, Jeffrey T. Draper. 439-440 [doi]
- Application-aware prefetch prioritization in on-chip networksNachiappan Chidambaram Nachiappan, Asit K. Mishra, Mahmut T. Kandemir, Anand Sivasubramaniam, Onur Mutlu, Chita R. Das. 441-442 [doi]
- ReCaP: a region-based cure for the common cold cacheJason Zebchuk, Harold W. Cain, Vijayalakshmi Srinivasan, Andreas Moshovos. 443-444 [doi]
- Power-efficient computing for compute-intensive GPGPU applicationsSyed Zohaib Gilani, Nam Sung Kim, Michael J. Schulte. 445-446 [doi]
- Off-chip access localization for NoC-based multicoresWei Ding, Mahmut T. Kandemir, Yuanrui Zhang, Emre Kultursay. 447-448 [doi]
- Many-thread aware instruction-level parallelism: architecting shader cores for GPU computingPing Xiang, Yi Yang, Mike Mantor, Norm Rubin, Huiyang Zhou. 449-450 [doi]
- PS-Dir: a scalable two-level directory cacheJoan J. Valls, Alberto Ros, Julio Sahuquillo, María Engracia Gómez, José Duato. 451-452 [doi]
- Inference and declaration of independence: impact on deterministic task parallelismFoivos S. Zakkak, Dimitrios Chasapis, Polyvios Pratikakis, Angelos Bilas, Dimitrios S. Nikolopoulos. 453-454 [doi]
- Application-to-core mapping policies to reduce memory interference in multi-core systemsReetuparna Das, Rachata Ausavarungnirun, Onur Mutlu, Akhilesh Kumar, Mani Azimi. 455-456 [doi]
- Bandwidth bandit: quantitative characterization of memory contentionDavid Eklov, Nikos Nikoleris, David Black-Schaffer, Erik Hagersten. 457-458 [doi]
- Speculative dynamic vectorization for HW/SW co-designed processorsRakesh Kumar, Alejandro Martínez, Antonio González. 459-460 [doi]
- Fine-grained parallel traversals of irregular data structuresBin Ren, Gagan Agrawal, James R. Larus, Todd Mytkowicz, Tomi Poutanen, Wolfram Schulte. 461-462 [doi]
- High-performance analysis of filtered semantic graphsAydin Buluç, Armando Fox, John R. Gilbert, Shoaib Kamil, Adam Lugowski, Leonid Oliker, Samuel Williams. 463-464 [doi]
- Energy-efficient cache partitioning for future CMPsKarthik T. Sundararajan, Timothy M. Jones, Nigel P. Topham. 465-466 [doi]
- A low-overhead dynamic optimization framework for multicoresChristopher W. Fletcher, Rachael Harding, Omer Khan, Srinivas Devadas. 467-468 [doi]
- Making it practical and effective: fast and precise may-happen-in-parallel analysisCongming Chen, Wei Huo, Xiaobing Feng 0002. 469-470 [doi]
- Mileage-based contention management in transactional memoryWoojin Choi, Lihang Zhao, Jeff Draper. 471-472 [doi]
- System-level power-performance efficiency modeling for emergent GPU architecturesShuaiwen Song, Kirk W. Cameron. 473-474 [doi]
- Transactional event profiling in a best-effort hardware transactional memory systemMatthew Gaudet, José Nelson Amaral. 475-476 [doi]
- Transparent runtime deadlock eliminationHari K. Pyla, Srinidhi Varadarajan. 477-478 [doi]
- Design of a storage processing unitPeng Li, Kevin Gomez, David J. Lilja. 479-480 [doi]
- SkipCache: miss-rate aware cache managementRaghavendra K, Tripti Warrier, Madhu Mutyam. 481-482 [doi]
- Using combined profiling to decide when thread level speculation is profitableArnamoy Bhattacharyya. 483-484 [doi]
- Hardware prefetchers for emerging parallel applicationsBiswabandan Panda, Shankar Balachandran. 485-486 [doi]
- Strategies based on green policies to the grid resource allocationFábio Coutinho, Luís Alfredo V. de Carvalho. 487-488 [doi]
- Linearly compressed pages: a main memory compression framework with low complexity and low latencyGennady Pekhimenko, Todd C. Mowry, Onur Mutlu. 489-490 [doi]
- Energy-efficient workload mapping in heterogeneous systems with multiple types of resourcesCong Liu. 491-492 [doi]
- Phase-based scheduling and thread migration for heterogeneous multicore processorsLina Sawalha, Ronald D. Barnes. 493-494 [doi]