0 | -- | 0 | Ziaul Choudhury, Anish Gulati, Suresh Purini. FlowPix: Accelerating Image Processing Pipelines on an FPGA Overlay using a Domain Specific Compiler |
0 | -- | 0 | Bowen He, Xiao Zheng, Yuan Chen, Weinan Li, Yajin Zhou, Xin Long, Pengcheng Zhang, Xiaowei Lu, Linquan Jiang, Qiang Liu, Dennis Cai, Xiantao Zhang. DxPU: Large-scale Disaggregated GPU Pools in the Datacenter |
0 | -- | 0 | Petros Anastasiadis, Nikela Papadopoulou, Georgios I. Goumas, Nectarios Koziris, Dennis Hoppe, Li Zhong. PARALiA: A Performance Aware Runtime for Auto-tuning Linear Algebra on Heterogeneous Systems |
0 | -- | 0 | Hui Yu, Yu Zhang 0027, Jin Zhao 0003, Yujian Liao, Zhiying Huang, Donghao He, Lin Gu 0002, Hai Jin 0001, Xiaofei Liao, Haikun Liu, Bingsheng He, Jianhui Yue. RACE: An Efficient Redundancy-aware Accelerator for Dynamic Graph Neural Network |
0 | -- | 0 | Zachary Susskind, Aman Arora, Igor D. S. Miranda, Alan T. L. Bacellar, Luis A. Q. Villon, Rafael Fontella Katopodis, Leandro Santiago de Araújo, Diego L. C. Dutra, Priscila M. V. Lima, Felipe M. G. França, Maurício Breternitz, Lizy K. John. ULEEN: A Novel Architecture for Ultra-low-energy Edge Neural Networks |
0 | -- | 0 | Donglei Wu, Weihao Yang, Xiangyu Zou, Wen Xia, Shiyi Li, Zhenbo Hu, Weizhe Zhang, Binxing Fang. Smart-DNN+: A Memory-efficient Neural Networks Compression Framework for the Model Inference |
0 | -- | 0 | Victor Ferrari, Rafael Cardoso Fernandes Sousa, Márcio Machado Pereira, Joao P. L. de Carvalho, José Nelson Amaral, José E. Moreira, Guido Araujo. Advancing Direct Convolution Using Convolution Slicing Optimization and ISA Extensions |
0 | -- | 0 | Jia Wei, Xingjun Zhang, Longxiang Wang, Zheng Wei. Fastensor: Optimise the Tensor I/O Path from SSD to GPU for Deep Learning Training |
0 | -- | 0 | Shiqing Zhang, Mahmood Naderan-Tahan, Magnus Jahre, Lieven Eeckhout. Characterizing Multi-Chip GPU Data Sharing |
0 | -- | 0 | Jens Domke, Emil Vatai, Balazs Gerofi, Yuetsu Kodama, Mohamed Wahib, Artur Podobas, Sparsh Mittal, Miquel Pericàs, Lingqi Zhang 0001, Peng Chen 0035, Aleksandr Drozd, Satoshi Matsuoka. At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads |
0 | -- | 0 | Miao Yu, Tingting Xiang, Venkata Pavan Kumar Miriyala, Trevor E. Carlson. Multiply-and-Fire: An Event-Driven Sparse Neural Network Accelerator |
0 | -- | 0 | Syed Salauddin Mohammad Tariq, Lance Menard, Pengfei Su 0001, Probir Roy. MicroProf: Code-level Attribution of Unnecessary Data Transfer in Microservice Applications |
0 | -- | 0 | Christian Menard, Marten Lohstroh, Soroush Bateni, Matthew Chorlian, Arthur Deng, Peter Donovan, Clément Fournier, Shaokai Lin, Felix Suchert, Tassilo Tanneberger, Hokeun Kim, Jerónimo Castrillón, Edward A. Lee. High-performance Deterministic Concurrency Using Lingua Franca |
0 | -- | 0 | Shiyi Li, Qiang Cao 0001, Shenggang Wan, Wen Xia, Changsheng Xie. gPPM: A Generalized Matrix Operation and Parallel Algorithm to Accelerate the Encoding/Decoding Process of Erasure Codes |
0 | -- | 0 | Satya Jaswanth Badri, Mukesh Saini, Neeraj Goel. Mapi-Pro: An Energy Efficient Memory Mapping Technique for Intermittent Computing |
0 | -- | 0 | Hai Jin 0001, Bo Lei, Haikun Liu, Xiaofei Liao, Zhuohui Duan, Chencheng Ye, Yu Zhang 0027. A Compilation Tool for Computation Offloading in ReRAM-based CIM Architectures |
0 | -- | 0 | Jiangsu Du, Jiazhi Jiang, Jiang Zheng, Hongbin Zhang, Dan Huang, Yutong Lu. Improving Computation and Memory Efficiency for Real-world Transformer Inference on GPUs |