Journal: TACO

Volume 10, Issue 4

22 -- 0Michael R. Jantz, Prasad A. Kulkarni. 1
23 -- 0Xiangyu Dong, Norman P. Jouppi, Yuan Xie. A circuit-architecture co-optimization framework for exploring nonvolatile memory hierarchies
24 -- 0Jishen Zhao, Guangyu Sun, Gabriel H. Loh, Yuan Xie. Optimizing GPU energy efficiency with 3D die-stacking graphics memory and reconfigurable memory interface
25 -- 0Chien-Chi Chen, Sheng-De Wang. An efficient multicharacter transition string-matching engine based on the aho-corasick algorithm
26 -- 0Yangchun Luo, Wei-Chung Hsu, Antonia Zhai. The design and implementation of heterogeneous multicore systems for energy-efficient speculative thread execution
27 -- 0Dyer Rolán, Basilio B. Fraguela, Ramon Doallo. 1
28 -- 0Samantika Subramaniam, Simon C. Steely Jr., William Hasenplaugh, Aamer Jaleel, Carl J. Beckmann, Tryggve Fossum, Joel S. Emer. Using in-flight chains to build a scalable cache coherence protocol
29 -- 0Daniel Sánchez, Yiannakis Sazeides, Juan M. Cebrian, José M. García 0001, Juan L. Aragón. Modeling the impact of permanent faults in caches
30 -- 0Sanghoon Lee 0006, James Tuck. Automatic parallelization of fine-grained metafunctions on a chip multiprocessor
31 -- 0Christophe Dubach, Timothy M. Jones, Edwin V. Bonilla. Dynamic microarchitectural adaptation using machine learning
32 -- 0Long Chen, Yanan Cao, Zhao Zhang. 3CC: A memory error protection scheme with novel address mapping for subranked and low-power memories
33 -- 0Yingying Tian, Samira Manabi Khan, Daniel A. Jiménez. Temporal-based multilevel correlating inclusive cache replacement
34 -- 0Qixiao Liu, Miquel Moretó, Victor Jiménez, Jaume Abella, Francisco J. Cazorla, Mateo Valero. Hardware support for accurate per-task energy metering in multicore systems
35 -- 0Sanyam Mehta, Gautham Beeraka, Pen-Chung Yew. Tile size selection revisited
36 -- 0Bogdan Prisacari, Germán Rodríguez, Cyriel Minkenberg, Torsten Hoefler. Fast pattern-specific routing for fat tree networks
37 -- 0Maximilien Breughe, Lieven Eeckhout. Selecting representative benchmark inputs for exploring microprocessor design spaces
38 -- 0Christoph Kerschbaumer, Eric Hennigan, Per Larsen, Stefan Brunthaler, Michael Franz. Information flow tracking meets just-in-time compilation
39 -- 0Rupesh Nasre. Time- and space-efficient flow-sensitive points-to analysis
40 -- 0Wenjia Ruan, Yujie Liu, Michael F. Spear. Boosting timestamp-based transactional memory by exploiting hardware cycle counters
41 -- 0Tanima Dey, Wei Wang, Jack W. Davidson, Mary Lou Soffa. ReSense: Mapping dynamic workloads of colocated multithreaded applications using resource sensitivity
42 -- 0Adrià Armejach, J. Rubén Titos Gil, Anurag Negi, Osman S. Unsal, Adrián Cristal. Techniques to improve performance in requester-wins hardware transactional memory
43 -- 0Myeongjae Jeon, Conglong Li, Alan L. Cox, Scott Rixner. Reducing DRAM row activations with eager read/write clustering
44 -- 0Zhijia Zhao, Michael Bebenita, Dave Herman, Jianhua Sun, Xipeng Shen. HPar: A practical parallel parser for HTML-taming HTML complexities for parallel parsing
45 -- 0Ehsan Totoni, Mert Dikmen, María Jesús Garzarán. Easy, fast, and energy-efficient object detection on heterogeneous on-chip architectures
46 -- 0Viacheslav V. Fedorov, Sheng Qiu, A. L. Narasimha Reddy, Paul V. Gratz. ARI: Adaptive LLC-memory traffic management
47 -- 0Cecilia González-Alvarez, Jennifer B. Sartor, Carlos Alvarez, Daniel Jiménez-González, Lieven Eeckhout. Accelerating an application domain with specialized functional units
48 -- 0Xiaolin Wang, Lingmei Weng, Zhenlin Wang, Yingwei Luo. Revisiting memory management on virtualized environments
49 -- 0Chuntao Jiang, Zhibin Yu, Hai Jin, Cheng-Zhong Xu, Lieven Eeckhout, Wim Heirman, Trevor E. Carlson, Xiaofei Liao. PCantorSim: Accelerating parallel architecture simulation through fractal-based sampling
50 -- 0Srdan Stipic, Vesna Smiljkovic, Osman S. Unsal, Adrián Cristal, Mateo Valero. Profile-guided transaction coalescing - lowering transactional overheads by merging transactions
51 -- 0Zhe Wang, Shuchang Shan, Ting Cao, Junli Gu, Yi Xu, Shuai Mu, Yuan Xie 0001, Daniel A. Jiménez. WADE: Writeback-aware dynamic cache management for NVM-based main memory system
52 -- 0Yong Li 0009, Yaojun Zhang, Hai Li, Yiran Chen, Alex K. Jones. C1C: A configurable, compiler-guided STT-RAM L1 cache
53 -- 0Naznin Fauzia, Venmugil Elango, Mahesh Ravishankar, J. Ramanujam, Fabrice Rastello, Atanas Rountev, Louis-Noël Pouchet, P. Sadayappan. Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential
54 -- 0Alen Bardizbanyan, Magnus Själander, David B. Whalley, Per Larsson-Edefors. Designing a practical data filter cache to improve both energy efficiency and performance
55 -- 0Andrei Hagiescu, Bing Liu 0013, R. Ramanathan, Sucheendra K. Palaniappan, Zheng Cui, Bipasa Chattopadhyay, P. S. Thiagarajan, Weng-Fai Wong. GPU code generation for ODE-based applications with phased shared-data access patterns
56 -- 0JungHee Lee, Chrysostomos Nicopoulos, Hyung Gyu Lee, Jongman Kim. TornadoNoC: A lightweight and scalable on-chip network architecture for the many-core era
57 -- 0Christos Strydis, Robert M. Seepers, Pedro Peris-Lopez, Dimitrios Siskos, Ioannis Sourdis. A system architecture, processor, and communication protocol for secure implants
58 -- 0Wonsub Kim, Yoonseo Choi, Haewoo Park. Fast modulo scheduler utilizing patternized routes for coarse-grained reconfigurable architectures
59 -- 0Dorit Nuzman, Revital Eres, Sergei Dyshel, Marcel Zalmanovici, Jose Castanos. JIT technology with C/C++: Feedback-directed dynamic recompilation for statically compiled languages
60 -- 0Thejas Ramashekar, Uday Bondhugula. Automatic data allocation and buffer management for multi-GPU machines
61 -- 0Hans Vandierendonck, George Tzenakis, Dimitrios S. Nikolopoulos. Analysis of dependence tracking algorithms for task dataflow execution
62 -- 0Yeonghun Jeong, Seongseok Seo, Jongeun Lee. Evaluator-executor transformation for efficient pipelining of loops with conditionals
63 -- 0Rajkishore Barik, Jisheng Zhao, Vivek Sarkar. A decoupled non-SSA global register allocation using bipartite liveness graphs
64 -- 0Peter Gavin, David B. Whalley, Magnus Själander. Reducing instruction fetch energy in multi-issue processors

Volume 10, Issue 3

9 -- 0. TACO Reviewers 2012
10 -- 0Eran Shifer, Shlomo Weiss. Low-latency adaptive mode transitions and hierarchical power management in asymmetric clustered cores
11 -- 0Yosi Ben-Asher, Nadav Rotem. Hybrid type legalization for a sparse SIMD instruction set
12 -- 0Yuanwu Lei, Yong Dou, Lei Guo, Jinbo Xu, Jie Zhou, Yazhuo Dong, Hongjian Li. VLIW coprocessor for IEEE-754 quadruple-precision elementary functions
13 -- 0Motohiro Kawahito, Hideaki Komatsu, Takao Moriyama, Hiroshi Inoue, Toshio Nakatani. Idiom recognition framework using topological embedding
14 -- 0Ghassan Shobaki, Maxim Shawabkeh, Najm Eldeen Abu Rmaileh. Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approach
15 -- 0Dongrui She, Yifan He, Henk Corporaal. An energy-efficient method of supporting flexible special instructions in an embedded processor with compact ISA
16 -- 0V. Krishna Nandivada, Rajkishore Barik. Improved bitwidth-aware variable packing
17 -- 0Jung Ho Ahn, Young Hoon Son, John Kim. Scalable high-radix router microarchitecture using a network switch organization
18 -- 0Libo Huang, Zhiying Wang, Nong Xiao, Yongwen Wang, Qiang Dou. Adaptive communication mechanism for accelerating MPI functions in NoC-based multicore processors
19 -- 0Avinash Malik, David Gregg. Orchestrating stream graphs using model checking
20 -- 0Zheng Wang, Michael F. P. O'Boyle. Using machine learning to partition streaming programs
21 -- 0Ali Bakhoda, John Kim, Tor M. Aamodt. Designing on-chip networks for throughput accelerators

Volume 10, Issue 2

6 -- 0Angeliki Kritikakou, Francky Catthoor, George Athanasiou, Vasilios I. Kelefouras, Costas E. Goutis. Near-Optimal Microprocessor and Accelerators Codesign with Latency and Throughput Constraints
7 -- 0Lei Jiang, Yu Du, Bo Zhao, Youtao Zhang, Bruce R. Childers, Jun Yang. Hardware-Assisted Cooperative Integration of Wear-Leveling and Salvaging for Phase Change Memory
8 -- 0Kyuseung Han, Junwhan Ahn, Kiyoung Choi. Power-Efficient Predication Techniques for Acceleration of Control Flow Execution on CGRA
9 -- 0Chao Wang, Xi Li, Junneng Zhang, Xuehai Zhou, Xiaoning Nie. MP-Tomasulo: A Dependency-Aware Automatic Parallel Execution Engine for Sequential Programs

Volume 10, Issue 1

1 -- 0Yunji Chen, Tianshi Chen, Ling Li, Ruiyang Wu, Dao-Fu Liu, Weiwu Hu. Deterministic Replay Using Global Clock
2 -- 0Daniel Lustig, Abhishek Bhattacharjee, Margaret Martonosi. TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs
3 -- 0Rong Chen, Haibo Chen. Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling
4 -- 0Michela Becchi, Patrick Crowley. A-DFA: A Time- and Space-Efficient DFA Compression Algorithm for Fast Regular Expression Evaluation
5 -- 0Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, Norman P. Jouppi. The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing