Journal: TACO

Volume 20, Issue 4

0 -- 0Ziaul Choudhury, Anish Gulati, Suresh Purini. FlowPix: Accelerating Image Processing Pipelines on an FPGA Overlay using a Domain Specific Compiler
0 -- 0Bowen He, Xiao Zheng, Yuan Chen, Weinan Li, Yajin Zhou, Xin Long, Pengcheng Zhang, Xiaowei Lu, Linquan Jiang, Qiang Liu, Dennis Cai, Xiantao Zhang. DxPU: Large-scale Disaggregated GPU Pools in the Datacenter
0 -- 0Petros Anastasiadis, Nikela Papadopoulou, Georgios I. Goumas, Nectarios Koziris, Dennis Hoppe, Li Zhong. PARALiA: A Performance Aware Runtime for Auto-tuning Linear Algebra on Heterogeneous Systems
0 -- 0Hui Yu, Yu Zhang 0027, Jin Zhao 0003, Yujian Liao, Zhiying Huang, Donghao He, Lin Gu 0002, Hai Jin 0001, Xiaofei Liao, Haikun Liu, Bingsheng He, Jianhui Yue. RACE: An Efficient Redundancy-aware Accelerator for Dynamic Graph Neural Network
0 -- 0Zachary Susskind, Aman Arora, Igor D. S. Miranda, Alan T. L. Bacellar, Luis A. Q. Villon, Rafael Fontella Katopodis, Leandro Santiago de Araújo, Diego L. C. Dutra, Priscila M. V. Lima, Felipe M. G. França, Maurício Breternitz, Lizy K. John. ULEEN: A Novel Architecture for Ultra-low-energy Edge Neural Networks
0 -- 0Donglei Wu, Weihao Yang, Xiangyu Zou, Wen Xia, Shiyi Li, Zhenbo Hu, Weizhe Zhang, Binxing Fang. Smart-DNN+: A Memory-efficient Neural Networks Compression Framework for the Model Inference
0 -- 0Victor Ferrari, Rafael Cardoso Fernandes Sousa, Márcio Machado Pereira, Joao P. L. de Carvalho, José Nelson Amaral, José E. Moreira, Guido Araujo. Advancing Direct Convolution Using Convolution Slicing Optimization and ISA Extensions
0 -- 0Jia Wei, Xingjun Zhang, Longxiang Wang, Zheng Wei. Fastensor: Optimise the Tensor I/O Path from SSD to GPU for Deep Learning Training
0 -- 0Shiqing Zhang, Mahmood Naderan-Tahan, Magnus Jahre, Lieven Eeckhout. Characterizing Multi-Chip GPU Data Sharing
0 -- 0Jens Domke, Emil Vatai, Balazs Gerofi, Yuetsu Kodama, Mohamed Wahib, Artur Podobas, Sparsh Mittal, Miquel Pericàs, Lingqi Zhang 0001, Peng Chen 0035, Aleksandr Drozd, Satoshi Matsuoka. At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads
0 -- 0Miao Yu, Tingting Xiang, Venkata Pavan Kumar Miriyala, Trevor E. Carlson. Multiply-and-Fire: An Event-Driven Sparse Neural Network Accelerator
0 -- 0Syed Salauddin Mohammad Tariq, Lance Menard, Pengfei Su 0001, Probir Roy. MicroProf: Code-level Attribution of Unnecessary Data Transfer in Microservice Applications
0 -- 0Christian Menard, Marten Lohstroh, Soroush Bateni, Matthew Chorlian, Arthur Deng, Peter Donovan, Clément Fournier, Shaokai Lin, Felix Suchert, Tassilo Tanneberger, Hokeun Kim, Jerónimo Castrillón, Edward A. Lee. High-performance Deterministic Concurrency Using Lingua Franca
0 -- 0Shiyi Li, Qiang Cao 0001, Shenggang Wan, Wen Xia, Changsheng Xie. gPPM: A Generalized Matrix Operation and Parallel Algorithm to Accelerate the Encoding/Decoding Process of Erasure Codes
0 -- 0Satya Jaswanth Badri, Mukesh Saini, Neeraj Goel. Mapi-Pro: An Energy Efficient Memory Mapping Technique for Intermittent Computing
0 -- 0Hai Jin 0001, Bo Lei, Haikun Liu, Xiaofei Liao, Zhuohui Duan, Chencheng Ye, Yu Zhang 0027. A Compilation Tool for Computation Offloading in ReRAM-based CIM Architectures
0 -- 0Jiangsu Du, Jiazhi Jiang, Jiang Zheng, Hongbin Zhang, Dan Huang, Yutong Lu. Improving Computation and Memory Efficiency for Real-world Transformer Inference on GPUs

Volume 20, Issue 3

0 -- 0Yuwen Zhao, Fangfang Liu, Wenjing Ma, Huiyuan Li, Yuanchi Peng, Cui Wang. MFFT: A GPU Accelerated Highly Efficient Mixed-Precision Large-Scale FFT Framework
0 -- 0Xinfeng Xie, Peng Gu, Yufei Ding, Dimin Niu, Hongzhong Zheng, Yuan Xie 0008. MPU: Memory-centric SIMT Processor via In-DRAM Near-bank Computing
0 -- 0Weizhi Xu 0001, Yintai Sun, Shengyu Fan, Hui Yu, Xin Fu. Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUs
0 -- 0Dong Huang, Dan Feng 0001, Qiankun Liu, Bo Ding, Wei Zhao 0034, Xueliang Wei, Wei Tong 0001. SplitZNS: Towards an Efficient LSM-Tree on Zoned Namespace SSDs
0 -- 0Abdul Rasheed Sahni, Hamza Omar, Usman Ali, Omer Khan. ASM: An Adaptive Secure Multicore for Co-located Mutually Distrusting Processes
0 -- 0Muhammad Waqar Azhar, Madhavan Manivannan, Per Stenström. Approx-RM: Reducing Energy on Heterogeneous Multicore Processors under Accuracy and Timing Constraints
0 -- 0Yufeng Zhou, Alan L. Cox, Sandhya Dwarkadas, Xiaowan Dong. The Impact of Page Size and Microarchitecture on Instruction Address Translation Overhead
0 -- 0Sooraj Puthoor, Mikko H. Lipasti. Turn-based Spatiotemporal Coherence for GPUs
0 -- 0Ruobing Chen 0002, Haosen Shi, Jinping Wu, Yusen Li, Xiaoguang Liu 0001, Gang Wang 0001. Jointly Optimizing Job Assignment and Resource Partitioning for Improving System Throughput in Cloud Datacenters
0 -- 0Jiazhi Jiang, Zijiang Huang, Dan Huang, Jiangsu Du, Lin Chen, Ziguan Chen, Yutong Lu. Hierarchical Model Parallelism for Optimizing Inference on Many-core Processor via Decoupled 3D-CNN Structure
0 -- 0Gokul Subramanian Ravi, Tushar Krishna, Mikko H. Lipasti. TNT: A Modular Approach to Traversing Physically Heterogeneous NOCs at Bare-wire Latency
0 -- 0Jin Zhao 0003, Yu Zhang 0027, Ligang He, Qikun Li, Xiang Zhang, Xinyu Jiang, Hui Yu, Xiaofei Liao, Hai Jin 0001, Lin Gu 0002, Haikun Liu, Bingsheng He, Ji Zhang, Xianzheng Song, Lin Wang, Jun Zhou 0011. GraphTune: An Efficient Dependency-Aware Substrate to Alleviate Irregularity in Concurrent Graph Processing
0 -- 0Benjamin Reber, Matthew Gould, Alexander H. Kneipp, Fangzhou Liu, Ian Prechtl, Chen Ding 0001, Linlin Chen, Dorin Patru. Cache Programming for Scientific Loops Using Leases
0 -- 0Alexander Krolik, Clark Verbrugge, Laurie J. Hendren. rNdN: Fast Query Compilation for NVIDIA GPUs

Volume 20, Issue 2

0 -- 0Jingwen Du, Fang Wang 0001, Dan Feng 0001, Changchen Gan, Yuchao Cao, Xiaomin Zou, Fan Li. Fast One-Sided RDMA-Based State Machine Replication for Disaggregated Memory
0 -- 0Ahmet Caner Yüzügüler, Canberk Sönmez, Mario Drumond, Yunho Oh, Babak Falsafi, Pascal Frossard. Scale-out Systolic Arrays
0 -- 0Nicolas Tollenaere, Guillaume Iooss, Stéphane Pouget, Hugo Brunie, Christophe Guillon, Albert Cohen 0001, P. Sadayappan, Fabrice Rastello. Autotuning Convolutions Is Easier Than You Think
0 -- 0Vinicius Espindola, Luciano Zago, Hervé Yviquel, Guido Araujo. Source Matching and Rewriting for MLIR Using String-Based Automata
0 -- 0Dongwei Chen, Dong Tong 0001, Chun Yang, Jiangfang Yi, Xu Cheng 0001. FlexPointer: Fast Address Translation Based on Range TLB and Tagged Pointers
0 -- 0Chandra Sekhar Mummidi, Sandip Kundu. ACTION: Adaptive Cache Block Migration in Distributed Cache Architectures
0 -- 0Sarabjeet Singh, Neelam Surana, Kailash Prasad, Pranjali Jain, Joycee Mekie, Manu Awasthi. HyGain: High-performance, Energy-efficient Hybrid Gain Cell-based Cache Hierarchy
0 -- 0Wenjing Ma, Fangfang Liu, Daokun Chen, Qinglin Lu, Yi Hu, Hongsen Wang, Xinhui Yuan. An Optimized Framework for Matrix Factorization on the New Sunway Many-core Platform
0 -- 0Qiaoyi Liu, Jeff Setter, Dillon Huff, Maxwell Strange, Kathleen Feng, Mark Horowitz, Priyanka Raina, Fredrik Kjolstad. Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory Accelerators
0 -- 0Victor Perez, Lukas Sommer, Victor Lomüller, Kumudha Narasimhan, Mehdi Goli. User-driven Online Kernel Fusion for SYCL
0 -- 0Hadjer Benmeziane, Hamza Ouarnoughi, Kaoutar El Maghraoui, Smaïl Niar. Multi-objective Hardware-aware Neural Architecture Search with Pareto Rank-preserving Surrogate Models
0 -- 0Francesco Minervini, Oscar Palomar, Osman S. Unsal, Enrico Reggiani, Josue V. Quiroga, Joan Marimon, Carlos Rojas, Roger Figueras, Abraham Ruiz, Alberto González 0004, Jonnatan Mendoza, Iván Vargas, César Hernández, Joan Cabre, Lina Khoirunisya, Mustapha Bouhali, Julian Pavon, Francesc Moll, Mauro Olivieri, Mario Kovac, Mate Kovac, Leon Dragic, Mateo Valero, Adrián Cristal. Vitruvius+: An Area-Efficient RISC-V Decoupled Vector Coprocessor for High Performance Computing Applications

Volume 20, Issue 1

0 -- 0Qiang Zhang, Lei Xu 0003, Baowen Xu. RegCPython: A Register-based Python Interpreter for Better Performance
0 -- 0Hai Jin 0001, Zhuo He, Weizhong Qiang. SpecTerminator: Blocking Speculative Side Channels Based on Instruction Classes on RISC-V
0 -- 0Nilesh Rajendra Shah, Ashitabh Misra, Antoine Miné, Rakesh Venkat, Ramakrishna Upadrasta. BullsEye : Scalable and Accurate Approximation Framework for Cache Miss Calculation
0 -- 0Zhangyu Chen, Yu Hua 0001, Luochangqi Ding, Bo Ding, Pengfei Zuo, Xue Liu 0001. Lock-Free High-performance Hashing for Persistent Memory via PM-aware Holistic Optimization
0 -- 0Mitali Soni, Asmita Pal, Joshua San Miguel. As-Is Approximate Computing
0 -- 0Yi Liang, Shaokang Zeng, Lei Wang 0004. Quantifying Resource Contention of Co-located Workloads with the System-level Entropy
0 -- 0Ivan Korostelev, Joao P. L. de Carvalho, José E. Moreira, José Nelson Amaral. YaConv: Convolution with Low Cache Footprint
0 -- 0Christos Sakalis, Stefanos Kaxiras, Magnus Själander. Delay-on-Squash: Stopping Microarchitectural Replay Attacks in Their Tracks
0 -- 0Suyeon Hur, Seongmin Na, Dongup Kwon, Joonsung Kim, Andrew Boutros, Eriko Nurvitadhi, Jangwoo Kim. A Fast and Flexible FPGA-based Accelerator for Natural Language Processing Neural Networks
0 -- 0Aristeidis Mastoras, Sotiris Anagnostidis, Albert-Jan Nicholas Yzelman. Design and Implementation for Nonblocking Execution in GraphBLAS: Tradeoffs and Performance
0 -- 0Ashish Gondimalla, Jianqiao Liu, Mithuna Thottethodi, T. N. Vijaykumar. Occam: Optimal Data Reuse for Convolutional Neural Networks
0 -- 0Manuela Schuler, Richard Membarth, Philipp Slusallek. XEngine: Optimal Tensor Rematerialization for Neural Networks in Heterogeneous Environments
0 -- 0Bo Peng, Yaozu Dong, Jianguo Yao, Fengguang Wu, Haibing Guan. FlexHM: A Practical System for Heterogeneous Memory with Flexible and Efficient Performance Optimizations
0 -- 0Furkan Eris, Marcia S. Louis, Kubra Eris, José Luis Abellán Miguel, Ajay Joshi. Puppeteer: A Random Forest Based Manager for Hardware Prefetchers Across the Memory Hierarchy
0 -- 0Parth Shah, Ranjal Gautham Shenoy, Vaidyanathan Srinivasan, Pradip Bose, Alper Buyuktosunoglu. TokenSmart: Distributed, Scalable Power Management in the Many-core Era
0 -- 0Yemao Xu, Dezun Dong, Dongsheng Wang, Shi Xu, Enda Yu, Weixia Xu, Xiangke Liao. SSD-SGD: Communication Sparsification for Distributed Deep Learning Training
0 -- 0Tuowen Zhao, Tobi Popoola, Mary W. Hall, Catherine Olschanowsky, Michelle Strout. Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-iteration
0 -- 0Thomas Luinaud, J. M. Pierre Langlois, Yvon Savaria. Symbolic Analysis for Data Plane Programs Specialization
0 -- 0Ataberk Olgun, Juan Gómez-Luna, Konstantinos Kanellopoulos, Behzad Salami 0001, Hasan Hassan, Oguz Ergin, Onur Mutlu. PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM