International Conference on Parallel Architectures and Compilation, PACT '14, Edmonton, AB, Canada, August 24-27, 2014 - researchr publication

researchr

You are not signed in
Sign in
Sign up

José Nelson Amaral, Josep Torrellas, editors, International Conference on Parallel Architectures and Compilation, PACT '14, Edmonton, AB, Canada, August 24-27, 2014. ACM, 2014. [doi]

Conference: IEEEpact2014

Abstract is missing.

Internet of mobile things: challenges and opportunitiesKlara Nahrstedt. 1-2 [doi]

Virtues and limitations of commodity hardware transactional memoryNuno Diegues, Paolo Romano, Luís Rodrigues. 3-14 [doi]

Cooperative cache scrubbingJennifer B. Sartor, Wim Heirman, Stephen M. Blackburn, Lieven Eeckhout, Kathryn S. McKinley. 15-26 [doi]

KLA: a new algorithmic paradigm for parallel graph computationsHarshvardhan, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger. 27-38 [doi]

Tiling and optimizing time-iterated computations on periodic domainsUday Bondhugula, Vinayaka Bandishti, Albert Cohen, Guillain Potron, Nicolas Vasilache. 39-50 [doi]

ATCache: reducing DRAM cache latency via a small SRAM tag cacheCheng-Chieh Huang, Vijay Nagarajan. 51-60 [doi]

SpongeDirectory: flexible sparse directories utilizing multi-level memristorsLunkai Zhang, Dmitri B. Strukov, Hebatallah Saadeldeen, Dongrui Fan, Mingzhe Zhang, Diana Franklin. 61-74 [doi]

EFetch: optimizing instruction fetch for event-driven webapplicationsGaurav Chadha, Scott A. Mahlke, Satish Narayanasamy. 75-86 [doi]

XStream: cross-core spatial streaming based MLC prefetchers for parallel applications in CMPsBiswabandan Panda, Shankar Balachandran. 87-98 [doi]

What is the cost of weak determinism?Cedomir Segulja, Tarek S. Abdelrahman. 99-112 [doi]

ILP and TLP in shared memory applications: a limit studyEhsan Fatehi, Paul Gratz. 113-126 [doi]

Versatile and scalable parallel histogram constructionWookeun Jung, JongSoo Park, Jaejin Lee. 127-138 [doi]

Bitwise data parallelism in regular expression matchingRobert D. Cameron, Thomas C. Shermer, Arrvindh Shriraman, Kenneth S. Herdy, Dan Lin 0003, Benjamin R. Hull, Meng Lin. 139-150 [doi]

Adaptive heterogeneous scheduling for integrated GPUsRashid Kaleem, Rajkishore Barik, Tatiana Shpeisman, Brian T. Lewis, Chunling Hu, Keshav Pingali. 151-162 [doi]

Warp-aware trace scheduling for GPUsJames A. Jablin, Thomas B. Jablin, Onur Mutlu, Maurice Herlihy. 163-174 [doi]

CAWS: criticality-aware warp scheduling for GPGPU workloadsShin-Ying Lee, Carole-Jean Wu. 175-186 [doi]

Invyswell: a hybrid transactional memory for haswell's restricted transactional memoryIrina Calciu, Justin Gottschlich, Tatiana Shpeisman, Gilles Pokam, Maurice Herlihy. 187-200 [doi]

Consolidated conflict detection for hardware transactional memoryLihang Zhao, Jeffrey T. Draper. 201-212 [doi]

DeSTM: harnessing determinism in STMs for application developmentKaushik Ravichandran, Ada Gavrilovska, Santosh Pande. 213-224 [doi]

PATS: pattern aware scheduling and power gating for GPGPUsQiumin Xu, Murali Annavaram. 225-236 [doi]

Heterogeneous microarchitectures trump voltage scaling for low-power coresAndrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Ronald G. Dreslinski, Thomas F. Wenisch, Scott A. Mahlke. 237-250 [doi]

RCS: runtime resource and core scaling for power-constrained multi-core processorsHamid Reza Ghasemi, Nam Sung Kim. 251-262 [doi]

Realm: an event-based low-level runtime for distributed memory architecturesSean Treichler, Michael Bauer, Alex Aiken. 263-276 [doi]

kMAF: automatic kernel-level management of thread and data affinityMatthias Diener, Eduardo Henrique Molina da Cruz, Philippe Olivier Alexandre Navaux, Anselm Busse, Hans-Ulrich Heiß. 277-288 [doi]

Shuffling: a framework for lock contention aware thread scheduling for multicore multiprocessor systemsKishore Kumar Pusukuri, Rajiv Gupta, Laxmi N. Bhuyan. 289-300 [doi]

Domain-specific models for innovation in analyticsBob Blainey. 301-302 [doi]

OpenTuner: an extensible framework for program autotuningJason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, Saman P. Amarasinghe. 303-316 [doi]

Velociraptor: an embedded compiler toolkit for numerical programs targeting CPUs and GPUsRahul Garg, Laurie J. Hendren. 317-330 [doi]

Memory scheduling towards high-throughput cooperative heterogeneous computingHao Wang, Ripudaman Singh, Michael J. Schulte, Nam Sung Kim. 331-342 [doi]

Bounded memory scheduling of dynamic task graphsDragos Sbirlea, Zoran Budimlic, Vivek Sarkar. 343-356 [doi]

Trading cache hit rate for memory performanceWei Ding, Mahmut T. Kandemir, Diana Guttman, Adwait Jog, Chita R. Das, Praveen Yedlapalli. 357-368 [doi]

Compiler support for selective page migration in NUMA architecturesGuilherme Piccoli, Henrique N. Santos, Raphael E. Rodrigues, Christiane Pousa, Edson Borin, Fernando M. Quintão Pereira. 369-380 [doi]

COLORIS: a dynamic cache partitioning system using page coloringYing Ye, Richard West, Zhuoqun Cheng, Ye Li. 381-392 [doi]

PEMOGEN: automatic adaptive performance modeling during program runtimeArnamoy Bhattacharyya, Torsten Hoefler. 393-404 [doi]

ArrayTool: a lightweight profiler to guide array regroupingXu Liu, Kamal Sharma, John M. Mellor-Crummey. 405-416 [doi]

Design for scalability in enterprise SSDsArash Tavakkol, Mohammad Arjomand, Hamid Sarbazi-Azad. 417-430 [doi]

2MA: accelerating coarse-grained data transfer for GPUsDavoud Anoushe Jamshidi, Mehrzad Samadi, Scott A. Mahlke. 431-442 [doi]

VAST: the illusion of a large memory space for GPUsJanghaeng Lee, Mehrzad Samadi, Scott A. Mahlke. 443-454 [doi]

Automatic optimization of thread-coarsening for graphics processorsAlberto Magni, Christophe Dubach, Michael F. P. O'Boyle. 455-466 [doi]

Automatic execution of single-GPU computations across multiple GPUsJavier Cabezas, Lluís Vilanova, Isaac Gelado, Thomas B. Jablin, Nacho Navarro, Wen-mei W. Hwu. 467-468 [doi]

LCA: a memory link and cache-aware co-scheduling approach for CMPsAlexandros-Herodotos Haritatos, Georgios I. Goumas, Nikos Anastopoulos, Konstantinos Nikas, Kornilios Kourtis, Nectarios Koziris. 469-470 [doi]

A run-time power manager exploiting software parallelismSimon Holmbacka, Sébastien Lafond, Johan Lilius. 471-472 [doi]

Graph-based performance accounting for chip multiprocessor memory systemsMagnus Jahre. 473-474 [doi]

SQRL: hardware accelerator for collecting software data structuresSnehasish Kumar, Arrvindh Shriraman, Vijayalakshmi Srinivasan, Dan Lin 0003, Jordon Phillips. 475-476 [doi]

Optimizing stencil code via locality of computationYulong Luo, Guangming Tan. 477-478 [doi]

ADHA: automatic data layout framework for heterogeneous architecturesDeepak Majeti, Kuldeep S. Meel, Rajkishore Barik, Vivek Sarkar. 479-480 [doi]

Active learning accelerated automatic heuristic construction for parallel program mappingWilliam F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather. 481-482 [doi]

Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernelsSreepathi Pai, R. Govindarajan, Matthew J. Thazhuthaveetil. 483-484 [doi]

Using STT-RAM to enable energy-efficient near-threshold chip multiprocessorsXiang Pan, Radu Teodorescu. 485-486 [doi]

Protection and utilization in shared cache through rationingRaj Parihar, Jacob Brock, Chen Ding, Michael C. Huang. 487-488 [doi]

Automatic parallelism through macro dataflow in high-level array languagesPushkar Ratnalikar, Arun Chauhan. 489-490 [doi]

A runtime support mechanism for fast mode switching of a self-morphing core for power efficiencySudarshan Srinivasan, Nithesh kurella, Israel Koren, Rance Rodrigues, Sandip Kundu. 491-492 [doi]

Rollback-free value prediction with approximate loadsBradley Thwaites, Gennady Pekhimenko, Hadi Esmaeilzadeh, Amir Yazdanbakhsh, Onur Mutlu, Jongse Park, Girish Mururu, Todd C. Mowry. 493-494 [doi]

Measuring flexibility in single-ISA heterogeneous processorsErik Tomusk, Christophe Dubach, Michael F. P. O'Boyle. 495-496 [doi]

SM-centric transformation: circumventing hardware restrictions for flexible GPU schedulingBo Wu, Guoyang Chen, Dong Li, Xipeng Shen, Jeffrey S. Vetter. 497-498 [doi]

An event-based language for dynamic binary translation frameworksSerguei Makarov, Angela Demke Brown, Ashvin Goel. 499-500 [doi]

Improving performance of streaming applications with filtering and control messagesPeng Li, Jeremy Buhler. 501-502 [doi]

Stratified sampling for even workload partitioningJeeva Paudel, José Nelson Amaral. 503-504 [doi]

Design of a hybrid MPI-CUDA benchmark suite for CPU-GPU clustersTejaswi Agarwal, Michela Becchi. 505-506 [doi]

Data remapping for an energy efficient burst chop in DRAM memory systemsSudharsan Jagathrakshakan, Venkata Kalyan Tavva, Madhu Mutyam. 507-508 [doi]

Data-reuse optimizations for pipelined tiling with parametric tile sizesAlexandre Isoard. 509-510 [doi]

From petascale to the pocket: Adaptively scaling parallel programs for mobile SoCsAdam Fidel, Nancy M. Amato, Lawrence Rauchwerger. 511-512 [doi]

Coarrays in GNU FortranAlessandro Fanfarillo, Tobias Burnus, Valeria Cardellini, Salvatore Filippone, Dan Nagle, Damian W. I. Rouson. 513-514 [doi]

Locality-aware memory association for multi-target worksharing in OpenMPThomas R. W. Scogland, Wu-chun Feng. 515-516 [doi]

Processing big data graphs on memory-restricted systemsHarshvardhan, Nancy M. Amato, Lawrence Rauchwerger. 517-518 [doi]

runs on WebDSL