Journal: TACO

Volume 16, Issue 4

0 -- 0Arun Thangamani, V. Krishna Nandivada. Optimizing Remote Communication in X10
0 -- 0Ian Briggs, Arnab Das, Mark Baranowski, Vishal Chandra Sharma, Sriram Krishnamoorthy, Zvonimir Rakamaric, Ganesh Gopalakrishnan. FailAmp: Relativization Transformation for Soft Error Detection in Structured Address Generation
0 -- 0Daniel Gerzhoy, Xiaowu Sun, Michael Zuzak, Donald Yeung. Nested MIMD-SIMD Parallelization for Heterogeneous Microprocessors
0 -- 0Wenbin Jiang, Yang Ma, Bo Liu, Haikun Liu, Bing Bing Zhou, Jian Zhu, Song Wu 0001, Hai Jin 0001. Layup: Layer-adaptive and Multi-type Intermediate-oriented Memory Optimization for GPU-based CNNs
0 -- 0Mostafa Koraei, Omid Fatemi, Magnus Jahre. DCMI: A Scalable Strategy for Accelerating Iterative Stencil Loops on FPGAs
0 -- 0Salonik Resch, S. Karen Khatamifard, Zamshed Iqbal Chowdhury, Masoud Zabihi, Zhengyang Zhao, Jianping Wang 0006, Sachin S. Sapatnekar, Ulya R. Karpuzcu. PIMBALL: Binary Neural Networks in Spintronic Memory
0 -- 0Jie Zhao 0002, Albert Cohen 0001. Flextended Tiles: A Flexible Extension of Overlapped Tiles for Polyhedral Compilation
0 -- 0Reem Elkhouly, Mohammad A. Alshboul, Akihiro Hayashi, Yan Solihin, Keiji Kimura. Compiler-support for Critical Data Persistence in NVM
0 -- 0Ahmad Yasin, Jawad Haj-Yahya, Yosi Ben-Asher, Avi Mendelson. A Metric-Guided Method for Discovering Impactful Features and Architectural Insights for Skylake-Based Processors
0 -- 0Manuel Selva, Fabian Gruber, Diogo Sampaio, Christophe Guillon, Louis-Noël Pouchet, Fabrice Rastello. Building a Polyhedral Representation from an Instrumented Execution: Making Dynamic Analyses of Nonaffine Programs Scalable
0 -- 0Leeor Peled, Uri C. Weiser, Yoav Etsion. A Neural Network Prefetcher for Arbitrary Memory Access Patterns
0 -- 0Aristeidis Mastoras, Thomas R. Gross. Chunking for Dynamic Linear Pipelines
0 -- 0Asif Ali Khan, Fazal Hameed, Robin Bläsing, Stuart S. P. Parkin, Jerónimo Castrillón. ShiftsReduce: Minimizing Shifts in Racetrack Memory 4.0
0 -- 0Kyle Daruwalla, Heng Zhuo, Rohit Shukla, Mikko H. Lipasti. BitSAD v2: Compiler Optimization and Analysis for Bitstream Computing
0 -- 0Zhen Hang Jiang, Yunsi Fei, David R. Kaeli. Exploiting Bank Conflict-based Side-channel Timing Leakage of GPUs
0 -- 0Michiel A. van der Vlag, Georgios Smaragdos, Zaid Al-Ars, Christos Strydis. Exploring Complex Brain-Simulation Workloads on Multi-GPU Deployments
0 -- 0Sriseshan Srikanth, Anirudh Jain, Joseph M. Lennon, Thomas M. Conte, Erik DeBenedictis, Jeanine E. Cook. MetaStrider: Architectures for Scalable Memory-centric Reduction of Sparse Data Streams
0 -- 0Lorenzo Chelini, Oleksandr Zinenko, Tobias Grosser, Henk Corporaal. Declarative Loop Tactics for Domain-specific Optimization
0 -- 0Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary Devito, William S. Moses, Sven Verdoolaege, Andrew Adams, Albert Cohen 0001. The Next 700 Accelerated Layers: From Mathematical Expressions of Network Computation Graphs to Accelerated GPU Kernels, Automatically
0 -- 0Khalid Ahmad, Hari Sundar, Mary W. Hall. Data-driven Mixed Precision Sparse Matrix Vector Multiplication for GPUs
0 -- 0Sergi Siso, Wes Armour, Jeyarajan Thiyagalingam. Evaluating Auto-Vectorizing Compilers through Objective Withdrawal of Useful Information
0 -- 0Larisa Stoltzfus, Bastian Hagedorn, Michel Steuwer, Sergei Gorlatch, Christophe Dubach. Tiling Optimizations for Stencil Computations Using Rewrite Rules in Lift
0 -- 0Chunwei Xia, Jiacheng Zhao, Huimin Cui, Xiaobing Feng 0002, Jingling Xue. DNNTune: Automatic Benchmarking DNN Models for Mobile-cloud Computing

Volume 16, Issue 3

0 -- 0Zhen Lin, Hongwen Dai, Michael Mantor, Huiyang Zhou. Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel Execution
0 -- 0Keryan Didier, Dumitru Potop-Butucaru, Guillaume Iooss, Albert Cohen, Jean Souyris, Philippe Baufreton, Amaury Graillat. Correct-by-Construction Parallelization of Hard Real-Time Avionics Applications on Off-the-Shelf Predictable Hardware
0 -- 0Ram Srivatsa Kannan, Michael Laurenzano, Jeongseob Ahn, Jason Mars, Lingjia Tang. Caliper: Interference Estimator for Multi-tenant Environments Sharing Architectural Resources
0 -- 0Mohammad Sadegh Sadeghi, Siavash Bayat Sarmadi, Shaahin Hessabi. Toward On-chip Network Security Using Runtime Isolation Mapping
0 -- 0Artem Chikin, Taylor Lloyd, José Nelson Amaral, Ettore Tiotto, Muhammad Usman. Memory-access-aware Safety and Profitability Analysis for Transformation of Accelerator-bound OpenMP Loops
0 -- 0Jakob Leben, George Tzanetakis. Polyhedral Compilation for Multi-dimensional Stream Processing
0 -- 0Jungwoo Park, Myoungjun Lee, Soontae Kim, Minho Ju, Jeongkyu Hong. MH Cache: A Mult Stephen Jarvisi-retention STT-RAM-based Low-power Last-level Cache for Mobile Hardware Rendering Systems
0 -- 0Pantea Zardoshti, Tingzhe Zhou, Pavithra Balaji, Michael L. Scott, Michael F. Spear. Simplifying Transactional Memory Support in C++
0 -- 0Chao Luo, Yunsi Fei, David R. Kaeli. Side-channel Timing Attack of RSA on a GPU
0 -- 0Stephen I. Roberts, Steven A. Wright, Suhaib A. Fahmy, Stephen A. Jarvis. The Power-optimised Software Envelope
0 -- 0Sanghoon Cha, Bokyeong Kim, Chang-Hyun Park, Jaehyuk Huh. Morphable DRAM Cache Design for Hybrid Memory Systems
0 -- 0Bingchao Li, Jizeng Wei, Jizhou Sun, Murali Annavaram, Nam Sung Kim. An Efficient GPU Cache Architecture for Applications with Irregular Memory Access Patterns
0 -- 0Liang Yuan, Chen Ding, Wesley Smith, Peter J. Denning, Yunquan Zhang. A Relational Theory of Locality
0 -- 0Stéphane Louise. A First Step Toward Using Quantum Computing for Low-level WCETs Estimations

Volume 16, Issue 2

0 -- 0Xiaoyuan Wang, Haikun Liu, Xiaofei Liao, Ji Chen, Hai Jin 0001, Yu Zhang 0027, Long Zheng 0003, Bingsheng He, Song Jiang. Supporting Superpages and Lightweight Page Migration in Hybrid Memory Systems
0 -- 0Aristeidis Mastoras, Thomas R. Gross. Efficient and Scalable Execution of Fine-Grained Dynamic Linear Pipelines
0 -- 0Sahar Sargaran, Naser MohammadZadeh. SAQIP: A Scalable Architecture for Quantum Information Processors
0 -- 0Tae Jnu Ham, Juan L. Aragón, Margaret Martonosi. Efficient Data Supply for Parallel Heterogeneous Architectures
0 -- 0Mohammad A. Alshboul, Hussein Elnawawy, Reem Elkhouly, Keiji Kimura, James Tuck, Yan Solihin. Efficient Checkpointing with Recompute Scheme for Non-volatile Main Memory
0 -- 0Savvas Sioutas, Sander Stuijk, Luc Waeijen, Twan Basten, Henk Corporaal, Lou J. Somers. Schedule Synthesis for Halide Pipelines through Reuse Analysis
0 -- 0Prerna Budhkar, Ildar Absalyamov, Vasileios Zois, Skyler Windh, Walid A. Najjar, Vassilis J. Tsotras. Accelerating In-Memory Database Selections Using Latency Masking Hardware Threads
0 -- 0Zacharias Hadjilambrou, Marios Kleanthous, Georgia Antoniou, Antoni Portero, Yiannakis Sazeides. Comprehensive Characterization of an Open Source Document Search Engine
0 -- 0Pedro Yébenes, Jose Rocher-Gonzalez, Jesús Escudero-Sahuquillo, Pedro Javier García, Francisco J. Alfaro, Francisco J. Quiles 0001, Crispín Gómez Requena, José Duato. Combining Source-adaptive and Oblivious Routing with Congestion Control in High-performance Interconnects using Hybrid and Direct Topologies
0 -- 0Heinrich Riebler, Gavin Vaz, Tobias Kenter, Christian Plessl. Transparent Acceleration for Heterogeneous Platforms With Compilation to OpenCL
0 -- 0Xun Gong, Xiang Gong, Leiming Yu, David R. Kaeli. HAWS: Accelerating GPU Wavefront Execution through Selective Out-of-order Execution
0 -- 0Yang Song 0006, Olivier Alavoine, Bill Lin. A Self-aware Resource Management Framework for Heterogeneous Multicore SoCs with Diverse QoS Targets
0 -- 0Yemao Xu, Dezun Dong, Weixia Xu, Xiangke Liao. SketchDLC: A Sketch on Distributed Deep Learning Communication via Trace Capturing

Volume 16, Issue 1

0 -- 0Leonid Azriel, Lukas Humbel, Reto Achermann, Alex Richardson, Moritz Hoffmann, Avi Mendelson, Timothy Roscoe, Robert N. M. Watson, Paolo Faraboschi, Dejan S. Milojicic. Memory-Side Protection With a Capability Enforcement Co-Processor
0 -- 0Yu-Ping Liu, Ding-Yong Hong, Jan-Jan Wu, Sheng-Yu Fu, Wei-Chung Hsu. Exploiting SIMD Asymmetry in ARM-to-x86 Dynamic Binary Translation
0 -- 0Halit Dogan, Masab Ahmad, Brian Kahne, Omer Khan. Accelerating Synchronization Using Moving Compute to Data Model at 1, 000-core Multicore Scale
0 -- 0Ghassan Shobaki, Austin Kerbow, Christopher Pulido, William Dobson. Exploring an Alternative Cost Function for Combinatorial Register-Pressure-Aware Instruction Scheduling
0 -- 0Aamer Jaleel, Eiman Ebrahimi, Sam Duncan. DUCATI: High-performance Address Translation by Extending TLB Reach of GPU-accelerated Systems
0 -- 0Mohammad Sadrosadati, Seyed Borna Ehsani, Hajar Falahati, Rachata Ausavarungnirun, Arash Tavakkol, Mojtaba Abaee, Lois Orosa, Yaohua Wang, Hamid Sarbazi-Azad, Onur Mutlu. ITAP: Idle-Time-Aware Power Management for GPU Execution Units