24 | -- | 0 | Bart Coppens, Bjorn De Sutter, Jonas Maebe. Feedback-driven binary code diversification |
25 | -- | 0 | Jeremy Fowers, Greg Brown, John Robert Wernsing, Greg Stitt. A performance and energy comparison of convolution on GPUs, FPGAs, and multicore processors |
26 | -- | 0 | Erven Rohou, Kevin Williams, David Yuste. Vectorization technology to improve interpreter performance |
27 | -- | 0 | Jimmy Cleary, Owen Callanan, Mark Purcell, David Gregg. Fast asymmetric thread synchronization |
28 | -- | 0 | Yong Li 0009, Rami G. Melhem, Alex K. Jones. PS-TLB: Leveraging page classification information for fast, scalable and efficient translation for future CMPs |
29 | -- | 0 | Kristof Du Bois, Stijn Eyerman, Lieven Eeckhout. Per-thread cycle accounting in multicore processors |
30 | -- | 0 | Christian Wimmer, Michael Haupt, Michael L. Van de Vanter, Mick J. Jordan, Laurent Daynès, Doug Simon. Maxine: An approachable virtual machine for, and in, java |
31 | -- | 0 | Malik Murtaza Khan, Protonu Basu, Gabe Rudy, Mary W. Hall, Chun Chen, Jacqueline Chame. A script-based autotuning compiler system to generate high-performance CUDA code |
32 | -- | 0 | Kenzo Van Craeynest, Lieven Eeckhout. Understanding fundamental design choices in single-ISA heterogeneous multicore architectures |
33 | -- | 0 | Samuel Antao, Leonel Sousa. The CRNS framework and its application to programmable and reconfigurable cryptography |
34 | -- | 0 | Boubacar Diouf, Can Hantas, Albert Cohen, Özcan Özturk, Jens Palsberg. A decoupled local memory allocator |
35 | -- | 0 | Huimin Cui, Qing Yi, Jingling Xue, Xiaobing Feng. Layout-oblivious compiler optimization for matrix computations |
36 | -- | 0 | Stephen Dolan, Servesh Muralidharan, David Gregg. Compiler support for lightweight context switching |
37 | -- | 0 | Pablo Abad, Valentin Puente, José-Ángel Gregorio. LIGERO: A light but efficient router conceived for cache-coherent chip multiprocessors |
38 | -- | 0 | Jorge Albericio, Pablo Ibáñez, Víctor Viñals, José María Llabería. Exploiting reuse locality on inclusive shared last-level caches |
39 | -- | 0 | Paraskevas Yiapanis, Demian Rosas-Ham, Gavin Brown, Mikel Luján. Optimizing software runtime systems for speculative parallelization |
40 | -- | 0 | Cedric Nugteren, Pieter Custers, Henk Corporaal. Algorithmic species: A classification of affine loop nests for parallel programming |
41 | -- | 0 | Marco Gerards, Jan Kuper. Optimal DPM and DVFS for frame-based real-time systems |
42 | -- | 0 | Zhichao Yan, Hong Jiang, Yujuan Tan, Dan Feng. An integrated pseudo-associativity and relaxed-order approach to hardware transactional memory |
43 | -- | 0 | Doris Chen, Deshanand P. Singh. Profile-guided floating- to fixed-point conversion for hybrid FPGA-processor applications |
44 | -- | 0 | Yan Cui, Yingxin Wang, Yu Chen, Yuanchun Shi. Lock-contention-aware scheduler: A scalable and energy-efficient method for addressing scalability collapse on multicore systems |
45 | -- | 0 | Kishore Kumar Pusukuri, Rajiv Gupta, Laxmi N. Bhuyan. ADAPT: A framework for coscheduling multithreaded programs |
46 | -- | 0 | Michele Tartara, Stefano Crespi-Reghizzi. Continuous learning of compiler heuristics |
47 | -- | 0 | Grigorios Chrysos, Panagiotis Dagritzikos, Ioannis Papaefstathiou, Apostolos Dollas. HC-CART: A parallel system implementation of data mining classification and regression tree (CART) algorithm on a multi-FPGA system |
48 | -- | 0 | Jongwon Lee, Yohan Ko, Kyoungwoo Lee, Jonghee M. Youn, Yunheung Paek. Dynamic code duplication with vulnerability awareness for soft error detection on VLIW architectures |
49 | -- | 0 | Fabien Coelho, François Irigoin. API compilation for image hardware accelerators |
50 | -- | 0 | Carlos Luque, Miquel Moretó, Francisco J. Cazorla, Mateo Valero. Fair CPU time accounting in CMP+SMT processors |
51 | -- | 0 | Pavlos M. Mattheakis, Ioannis Papaefstathiou. Significantly reducing MPI intercommunication latency and power overhead in both embedded and HPC systems |
52 | -- | 0 | Riyadh Baghdadi, Albert Cohen, Sven Verdoolaege, Konrad Trifunovic. Improved loop tiling based on the removal of spurious false dependences |
53 | -- | 0 | Antoniu Pop, Albert Cohen. OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs |
54 | -- | 0 | Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, José Ignacio Gómez, Christian Tenllado, Francky Catthoor. Polyhedral parallel code generation for CUDA |
55 | -- | 0 | Yu Du, Miao Zhou, Bruce R. Childers, Rami G. Melhem, Daniel Mossé. Delta-compressed caching for overcoming the write bandwidth limitation of hybrid main memory |
56 | -- | 0 | Suresh Purini, Lakshya Jain. Finding good optimization sequences covering program space |
57 | -- | 0 | Mehmet E. Belviranli, Laxmi N. Bhuyan, Rajiv Gupta. A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures |
58 | -- | 0 | Anurag Negi, J. Rubén Titos Gil. SCIN-cache: Fast speculative versioning in multithreaded cores |
59 | -- | 0 | Thibaut Lutz, Christian Fensch, Murray Cole. PARTANS: An autotuning framework for stencil computation on multi-GPU systems |
60 | -- | 0 | Chunhua Xiao, M.-C. Frank Chang, Jason Cong, Michael Gill, Zhangqin Huang, Chunyue Liu, Glenn Reinman, Hao Wu. Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects |