Abstract is missing.
- Build Watson: an overview of DeepQA for the Jeopardy! challengeDavid Ferrucci. 1-2 [doi]
- Towards a science of parallel programmingKeshav Pingali. 3-4 [doi]
- Raising the level of many-core programming with compiler technology: meeting a grand challengeWen-mei Hwu. 5-6 [doi]
- Power and thermal characterization of POWER6 systemVictor Jiménez, Francisco J. Cazorla, Roberto Gioiosa, Mateo Valero, Carlos Boneti, Eren Kursun, Chen-Yong Cher, Canturk Isci, Alper Buyuktosunoglu, Pradip Bose. 7-18 [doi]
- System-level max power (SYMPO): a systematic approach for escalating system-level power consumption using synthetic benchmarksKarthik Ganesan, Jungho Jo, W. Lloyd Bircher, Dimitris Kaseridis, Zhibin Yu, Lizy K. John. 19-28 [doi]
- Scalable thread scheduling and global power management for heterogeneous many-core architecturesJonathan A. Winter, David H. Albonesi, Christine A. Shoemaker. 29-40 [doi]
- Dynamically managed multithreaded reconfigurable architectures for chip multiprocessorsMatthew A. Watkins, David H. Albonesi. 41-52 [doi]
- Accelerating multicore reuse distance analysis with sampling and parallelizationDerek L. Schuff, Milind Kulkarni, Vijay S. Pai. 53-64 [doi]
- Simple and fast biased locksNalini Vasudevan, Kedar S. Namjoshi, Stephen A. Edwards. 65-74 [doi]
- Avoiding deadlock avoidanceHari K. Pyla, Srinidhi Varadarajan. 75-86 [doi]
- DAFT: decoupled acyclic fault toleranceYun Zhang, Jae W. Lee, Nick P. Johnson, David I. August. 87-98 [doi]
- WAYPOINT: scaling coherence to thousand-core architecturesJohn H. Kelm, Matthew R. Johnson, Steven S. Lumetta, Sanjay J. Patel. 99-110 [doi]
- Subspace snooping: filtering snoops with operating system supportDaehoon Kim, Jeongseob Ahn, Jae-Hong Kim, Jaehyuk Huh. 111-122 [doi]
- Proximity coherence for chip multiprocessorsNick Barrow-Williams, Christian Fensch, Simon W. Moore. 123-134 [doi]
- SPACE: sharing pattern-based directory coherence for multicore scalabilityHongzhou Zhao, Arrvindh Shriraman, Sandhya Dwarkadas. 135-146 [doi]
- Feedback-directed pipeline parallelismM. Aater Suleman, Moinuddin K. Qureshi, Khubaib, Yale N. Patt. 147-156 [doi]
- Scalable hardware support for conditional parallelizationZheng Li, Olivier Certner, Jose Duato, Olivier Temam. 157-168 [doi]
- Reducing task creation and termination overhead in explicitly parallel programsJisheng Zhao, Jun Shirako, V. Krishna Nandivada, Vivek Sarkar. 169-180 [doi]
- MEDICS: ultra-portable processing for medical image reconstructionGanesh S. Dasika, Ankit Sethia, Vincentius Robby, Trevor N. Mudge, Scott A. Mahlke. 181-192 [doi]
- An OpenCL framework for heterogeneous multicores with local memoryJaejin Lee, Jungwon Kim, Sangmin Seo, Seungkyun Kim, Jung-Ho Park, Honggyu Kim, Thanh Tuan Dao, Yongjin Cho, Sung Jong Seo, Seung Hak Lee, Seung Mo Cho, Hyo Jung Song, Sang-Bum Suh, Jong-Deok Choi. 193-204 [doi]
- Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processorsJayanth Gummaraju, Laurent Morichetti, Michael Houston, Ben Sander, Benedict R. Gaster, Bixia Zheng. 205-216 [doi]
- MapCG: writing parallel program portable between CPU and GPUChuntao Hong, Dehao Chen, Wenguang Chen, Weimin Zheng, Haibo Lin. 217-226 [doi]
- Adaptive spatiotemporal node selection in dynamic networksPradip Hari, John B. P. McCabe, Jonathan Banafato, Marcus Henry, Kevin Ko, Emmanouil Koukoumidis, Ulrich Kremer, Margaret Martonosi, Li-Shiuan Peh. 227-236 [doi]
- On mitigating memory bandwidth contention through bandwidth-aware schedulingDi Xu, Chenggang Wu, Pen-Chung Yew. 237-248 [doi]
- AKULA: a toolset for experimenting and developing thread placement algorithms on multicore systemsSergey Zhuravlev, Sergey Blagodurov, Alexandra Fedorova. 249-260 [doi]
- Criticality-driven superscalar design space explorationSandeep Navada, Niket Kumar Choudhary, Eric Rotenberg. 261-272 [doi]
- A programmable parallel accelerator for learning and classificationSrihari Cadambi, Abhinandan Majumdar, Michela Becchi, Srimat T. Chakradhar, Hans Peter Graf. 273-284 [doi]
- Discovering and understanding performance bottlenecks in transactional applicationsFerad Zyulkyarov, Srdjan Stipic, Tim Harris, Osman S. Unsal, Adrián Cristal, Ibrahim Hur, Mateo Valero. 285-294 [doi]
- Efficient sequential consistency using conditional fencesChanghui Lin, Vijay Nagarajan, Rajiv Gupta. 295-306 [doi]
- Partitioning streaming parallelism for multi-cores: a machine learning based approachZheng Wang, Michael F. P. O Boyle. 307-318 [doi]
- Handling the problems and opportunities posed by multiple on-chip memory controllersManu Awasthi, David W. Nellans, Kshitij Sudan, Rajeev Balasubramonian, Al Davis. 319-330 [doi]
- Design and implementation of the PLUG architecture for programmable and efficient network lookupsAmit Kumar, Lorenzo De Carli, Sung Jin Kim, Marc de Kruijf, Karthikeyan Sankaralingam, Cristian Estan, Somesh Jha. 331-342 [doi]
- A model for fusion and code motion in an automatic parallelizing compilerUday Bondhugula, Oktay Günlük, Sanjeeb Dash, Lakshminarayanan Renganarayanan. 343-352 [doi]
- Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systemsGregory Frederick Diamos, Andrew Kerr, Sudhakar Yalamanchili, Nathan Clark. 353-364 [doi]
- An empirical characterization of stream programs and its implications for language and compiler designWilliam Thies, Saman P. Amarasinghe. 365-376 [doi]
- Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling informationGeorgios Tournavitis, Björn Franke. 377-388 [doi]
- The Paralax infrastructure: automatic parallelization with a helping handHans Vandierendonck, Sean Rul, Koen De Bosschere. 389-400 [doi]
- AM++: a generalized active message frameworkJeremiah Willcock, Torsten Hoefler, Nicholas Gerard Edmonds, Andrew Lumsdaine. 401-410 [doi]
- Using memory mapping to support cactus stacks in work-stealing runtime systemsI.-Ting Angelina Lee, Silas Boyd-Wickizer, Zhiyi Huang, Charles E. Leiserson. 411-420 [doi]
- Speculative-aware execution: a simple and efficient technique for utilizing multi-cores to improve single-thread performanceRania H. Mameesh, Manoj Franklin. 421-430 [doi]
- The potential of using dynamic information flow analysis in data value predictionWalid J. Ghandour, Haitham Akkary, Wes Masri. 431-442 [doi]
- Efficient runahead threadsTanausú Ramírez, Alex Pajuelo, Oliverio J. Santana, Onur Mutlu, Mateo Valero. 443-452 [doi]
- Energy efficient speculative threads: dynamic thread allocation in Same-ISA heterogeneous multicore systemsYangchun Luo, Venkatesan Packirisamy, Wei-Chung Hsu, Antonia Zhai. 453-464 [doi]
- SWEL: hardware cache coherence protocols to map shared data onto shared cachesSeth H. Pugsley, Josef B. Spjut, David W. Nellans, Rajeev Balasubramonian. 465-476 [doi]
- ATAC: a 1000-core cache-coherent processor with on-chip optical networkGeorge Kurian, Jason E. Miller, James Psota, Jonathan Eastep, Jifeng Liu, Jürgen Michel, Lionel C. Kimerling, Anant Agarwal. 477-488 [doi]
- Using dead blocks as a virtual victim cacheSamira Manabi Khan, Daniel A. Jiménez, Doug Burger, Babak Falsafi. 489-500 [doi]
- Compiler-assisted data distribution for chip multiprocessorsYong Li, Ahmed Abousamra, Rami G. Melhem, Alex K. Jones. 501-512 [doi]
- Data layout transformation exploiting memory-level parallelism in structured grid many-core applicationsI-Jui Sung, John A. Stratton, Wen-mei W. Hwu. 513-522 [doi]
- Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tilingRong Chen, Haibo Chen, Binyu Zang. 523-534 [doi]
- On-chip network design considerations for compute acceleratorsAli Bakhoda, John Kim, Tor M. Aamodt. 535-536 [doi]
- Believe it or not!: mult-core CPUs can match GPU performance for a FLOP-intensive application!Rajesh Bordawekar, Uday Bondhugula, Ravi Rao. 537-538 [doi]
- Ordered and unordered algorithms for parallel breadth first searchMuhammad Amber Hassaan, Martin Burtscher, Keshav Pingali. 539-540 [doi]
- Moths: mobile threads for on-chip networksMatthew Misler, Natalie D. Enright Jerger. 541-542 [doi]
- Improving speculative loop parallelization via selective squash and speculation reuseSanthosh Sharma Ananthramu, Deepak Majeti, Sanjeev Kumar Aggarwal, Mainak Chaudhuri. 543-544 [doi]
- Revisiting sorting for GPGPU stream architecturesDuane Merrill, Andrew S. Grimshaw. 545-546 [doi]
- Analyzing cache performance bottlenecks of STM applications and addressing them with compiler s helpSandya S. Mannarswamy, Ramaswamy Govindarajan. 547-548 [doi]
- An intra-tile cache set balancing schemeMohammad Hammoud, Sangyeun Cho, Rami G. Melhem. 549-550 [doi]
- StatCC: a statistical cache contention modelDavid Eklov, David Black-Schaffer, Erik Hagersten. 551-552 [doi]
- An integer programming framework for optimizing shared memory use on GPUsWenjing Ma, Gagan Agrawal. 553-554 [doi]
- Exploiting subtrace-level parallelism in clustered processorsRafael Ubal, Julio Sahuquillo, Salvador Petit, Pedro López, José Duato. 555-556 [doi]
- A case for NUMA-aware contention management on multicore systemsSergey Blagodurov, Sergey Zhuravlev, Alexandra Fedorova, Ali Kamali. 557-558 [doi]
- DMATiler: revisiting loop tiling for direct memory accessHaibo Lin, Tao Liu, Huoding Li, Tong Chen, Lakshminarayanan Renganarayanan, Kevin O Brien, Ling Shao. 559-560 [doi]
- Scaling of the PARSEC benchmark inputsChristian Bienia, Kai Li. 561-562 [doi]
- Online cache modeling for commodity multicore processorsRichard West, Puneet Zaroo, Carl A. Waldspurger, Xiao Zhang. 563-564 [doi]
- NoC-aware cache design for chip multiprocessorsAhmed Abousamra, Rami G. Melhem, Alex K. Jones. 565-566 [doi]
- A software-SVM-based transactional memory for multicore accelerator architectures with local memoryJun Lee, Sangmin Seo, Jaejin Lee. 567-568 [doi]
- NUcache: a multicore cache organization based on next-use distanceR. Manikantan, Kaushik Rajan, R. Govindarajan. 569-570 [doi]
- CoreGenesis: erasing core boundaries for robust and configurable performanceShantanu Gupta, Shuguang Feng, Amin Ansari, Ganesh S. Dasika, Scott A. Mahlke. 571-572 [doi]
- Automatic vector instruction selection for dynamic compilationRajkishore Barik, Jisheng Zhao, Vivek Sarkar. 573-574 [doi]
- Approximating age-based arbitration in on-chip networksMichael M. Lee, John Kim, Dennis Abts, Michael R. Marty, Jae W. Lee. 575-576 [doi]