Abstract is missing.
- Internet of mobile things: challenges and opportunitiesKlara Nahrstedt. 1-2 [doi]
- Virtues and limitations of commodity hardware transactional memoryNuno Diegues, Paolo Romano, Luís Rodrigues. 3-14 [doi]
- Cooperative cache scrubbingJennifer B. Sartor, Wim Heirman, Stephen M. Blackburn, Lieven Eeckhout, Kathryn S. McKinley. 15-26 [doi]
- KLA: a new algorithmic paradigm for parallel graph computationsHarshvardhan, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger. 27-38 [doi]
- Tiling and optimizing time-iterated computations on periodic domainsUday Bondhugula, Vinayaka Bandishti, Albert Cohen, Guillain Potron, Nicolas Vasilache. 39-50 [doi]
- ATCache: reducing DRAM cache latency via a small SRAM tag cacheCheng-Chieh Huang, Vijay Nagarajan. 51-60 [doi]
- SpongeDirectory: flexible sparse directories utilizing multi-level memristorsLunkai Zhang, Dmitri B. Strukov, Hebatallah Saadeldeen, Dongrui Fan, Mingzhe Zhang, Diana Franklin. 61-74 [doi]
- EFetch: optimizing instruction fetch for event-driven webapplicationsGaurav Chadha, Scott A. Mahlke, Satish Narayanasamy. 75-86 [doi]
- XStream: cross-core spatial streaming based MLC prefetchers for parallel applications in CMPsBiswabandan Panda, Shankar Balachandran. 87-98 [doi]
- What is the cost of weak determinism?Cedomir Segulja, Tarek S. Abdelrahman. 99-112 [doi]
- ILP and TLP in shared memory applications: a limit studyEhsan Fatehi, Paul Gratz. 113-126 [doi]
- Versatile and scalable parallel histogram constructionWookeun Jung, JongSoo Park, Jaejin Lee. 127-138 [doi]
- Bitwise data parallelism in regular expression matchingRobert D. Cameron, Thomas C. Shermer, Arrvindh Shriraman, Kenneth S. Herdy, Dan Lin 0003, Benjamin R. Hull, Meng Lin. 139-150 [doi]
- Adaptive heterogeneous scheduling for integrated GPUsRashid Kaleem, Rajkishore Barik, Tatiana Shpeisman, Brian T. Lewis, Chunling Hu, Keshav Pingali. 151-162 [doi]
- Warp-aware trace scheduling for GPUsJames A. Jablin, Thomas B. Jablin, Onur Mutlu, Maurice Herlihy. 163-174 [doi]
- CAWS: criticality-aware warp scheduling for GPGPU workloadsShin-Ying Lee, Carole-Jean Wu. 175-186 [doi]
- Invyswell: a hybrid transactional memory for haswell's restricted transactional memoryIrina Calciu, Justin Gottschlich, Tatiana Shpeisman, Gilles Pokam, Maurice Herlihy. 187-200 [doi]
- Consolidated conflict detection for hardware transactional memoryLihang Zhao, Jeffrey T. Draper. 201-212 [doi]
- DeSTM: harnessing determinism in STMs for application developmentKaushik Ravichandran, Ada Gavrilovska, Santosh Pande. 213-224 [doi]
- PATS: pattern aware scheduling and power gating for GPGPUsQiumin Xu, Murali Annavaram. 225-236 [doi]
- Heterogeneous microarchitectures trump voltage scaling for low-power coresAndrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Ronald G. Dreslinski, Thomas F. Wenisch, Scott A. Mahlke. 237-250 [doi]
- RCS: runtime resource and core scaling for power-constrained multi-core processorsHamid Reza Ghasemi, Nam Sung Kim. 251-262 [doi]
- Realm: an event-based low-level runtime for distributed memory architecturesSean Treichler, Michael Bauer, Alex Aiken. 263-276 [doi]
- kMAF: automatic kernel-level management of thread and data affinityMatthias Diener, Eduardo Henrique Molina da Cruz, Philippe Olivier Alexandre Navaux, Anselm Busse, Hans-Ulrich Heiß. 277-288 [doi]
- Shuffling: a framework for lock contention aware thread scheduling for multicore multiprocessor systemsKishore Kumar Pusukuri, Rajiv Gupta, Laxmi N. Bhuyan. 289-300 [doi]
- Domain-specific models for innovation in analyticsBob Blainey. 301-302 [doi]
- OpenTuner: an extensible framework for program autotuningJason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, Saman P. Amarasinghe. 303-316 [doi]
- Velociraptor: an embedded compiler toolkit for numerical programs targeting CPUs and GPUsRahul Garg, Laurie J. Hendren. 317-330 [doi]
- Memory scheduling towards high-throughput cooperative heterogeneous computingHao Wang, Ripudaman Singh, Michael J. Schulte, Nam Sung Kim. 331-342 [doi]
- Bounded memory scheduling of dynamic task graphsDragos Sbirlea, Zoran Budimlic, Vivek Sarkar. 343-356 [doi]
- Trading cache hit rate for memory performanceWei Ding, Mahmut T. Kandemir, Diana Guttman, Adwait Jog, Chita R. Das, Praveen Yedlapalli. 357-368 [doi]
- Compiler support for selective page migration in NUMA architecturesGuilherme Piccoli, Henrique N. Santos, Raphael E. Rodrigues, Christiane Pousa, Edson Borin, Fernando M. Quintão Pereira. 369-380 [doi]
- COLORIS: a dynamic cache partitioning system using page coloringYing Ye, Richard West, Zhuoqun Cheng, Ye Li. 381-392 [doi]
- PEMOGEN: automatic adaptive performance modeling during program runtimeArnamoy Bhattacharyya, Torsten Hoefler. 393-404 [doi]
- ArrayTool: a lightweight profiler to guide array regroupingXu Liu, Kamal Sharma, John M. Mellor-Crummey. 405-416 [doi]
- Design for scalability in enterprise SSDsArash Tavakkol, Mohammad Arjomand, Hamid Sarbazi-Azad. 417-430 [doi]
- 2MA: accelerating coarse-grained data transfer for GPUsDavoud Anoushe Jamshidi, Mehrzad Samadi, Scott A. Mahlke. 431-442 [doi]
- VAST: the illusion of a large memory space for GPUsJanghaeng Lee, Mehrzad Samadi, Scott A. Mahlke. 443-454 [doi]
- Automatic optimization of thread-coarsening for graphics processorsAlberto Magni, Christophe Dubach, Michael F. P. O'Boyle. 455-466 [doi]
- Automatic execution of single-GPU computations across multiple GPUsJavier Cabezas, Lluís Vilanova, Isaac Gelado, Thomas B. Jablin, Nacho Navarro, Wen-mei W. Hwu. 467-468 [doi]
- LCA: a memory link and cache-aware co-scheduling approach for CMPsAlexandros-Herodotos Haritatos, Georgios I. Goumas, Nikos Anastopoulos, Konstantinos Nikas, Kornilios Kourtis, Nectarios Koziris. 469-470 [doi]
- A run-time power manager exploiting software parallelismSimon Holmbacka, Sébastien Lafond, Johan Lilius. 471-472 [doi]
- Graph-based performance accounting for chip multiprocessor memory systemsMagnus Jahre. 473-474 [doi]
- SQRL: hardware accelerator for collecting software data structuresSnehasish Kumar, Arrvindh Shriraman, Vijayalakshmi Srinivasan, Dan Lin 0003, Jordon Phillips. 475-476 [doi]
- Optimizing stencil code via locality of computationYulong Luo, Guangming Tan. 477-478 [doi]
- ADHA: automatic data layout framework for heterogeneous architecturesDeepak Majeti, Kuldeep S. Meel, Rajkishore Barik, Vivek Sarkar. 479-480 [doi]
- Active learning accelerated automatic heuristic construction for parallel program mappingWilliam F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather. 481-482 [doi]
- Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernelsSreepathi Pai, R. Govindarajan, Matthew J. Thazhuthaveetil. 483-484 [doi]
- Using STT-RAM to enable energy-efficient near-threshold chip multiprocessorsXiang Pan, Radu Teodorescu. 485-486 [doi]
- Protection and utilization in shared cache through rationingRaj Parihar, Jacob Brock, Chen Ding, Michael C. Huang. 487-488 [doi]
- Automatic parallelism through macro dataflow in high-level array languagesPushkar Ratnalikar, Arun Chauhan. 489-490 [doi]
- A runtime support mechanism for fast mode switching of a self-morphing core for power efficiencySudarshan Srinivasan, Nithesh kurella, Israel Koren, Rance Rodrigues, Sandip Kundu. 491-492 [doi]
- Rollback-free value prediction with approximate loadsBradley Thwaites, Gennady Pekhimenko, Hadi Esmaeilzadeh, Amir Yazdanbakhsh, Onur Mutlu, Jongse Park, Girish Mururu, Todd C. Mowry. 493-494 [doi]
- Measuring flexibility in single-ISA heterogeneous processorsErik Tomusk, Christophe Dubach, Michael F. P. O'Boyle. 495-496 [doi]
- SM-centric transformation: circumventing hardware restrictions for flexible GPU schedulingBo Wu, Guoyang Chen, Dong Li, Xipeng Shen, Jeffrey S. Vetter. 497-498 [doi]
- An event-based language for dynamic binary translation frameworksSerguei Makarov, Angela Demke Brown, Ashvin Goel. 499-500 [doi]
- Improving performance of streaming applications with filtering and control messagesPeng Li, Jeremy Buhler. 501-502 [doi]
- Stratified sampling for even workload partitioningJeeva Paudel, José Nelson Amaral. 503-504 [doi]
- Design of a hybrid MPI-CUDA benchmark suite for CPU-GPU clustersTejaswi Agarwal, Michela Becchi. 505-506 [doi]
- Data remapping for an energy efficient burst chop in DRAM memory systemsSudharsan Jagathrakshakan, Venkata Kalyan Tavva, Madhu Mutyam. 507-508 [doi]
- Data-reuse optimizations for pipelined tiling with parametric tile sizesAlexandre Isoard. 509-510 [doi]
- From petascale to the pocket: Adaptively scaling parallel programs for mobile SoCsAdam Fidel, Nancy M. Amato, Lawrence Rauchwerger. 511-512 [doi]
- Coarrays in GNU FortranAlessandro Fanfarillo, Tobias Burnus, Valeria Cardellini, Salvatore Filippone, Dan Nagle, Damian W. I. Rouson. 513-514 [doi]
- Locality-aware memory association for multi-target worksharing in OpenMPThomas R. W. Scogland, Wu-chun Feng. 515-516 [doi]
- Processing big data graphs on memory-restricted systemsHarshvardhan, Nancy M. Amato, Lawrence Rauchwerger. 517-518 [doi]