Abstract is missing.
- PIM-DL: Boosting DNN Inference on Digital Processing In-Memory Architectures via Data Layout OptimizationsMinxuan Zhou, Guoyang Chen, Mohsen Imani, Saransh Gupta, Weifeng Zhang, Tajana Rosing. 1 [doi]
- A Flexible Approach to Autotuning Multi-Pass Machine Learning CompilersPhitchaya Mangpo Phothilimthana, Amit Sabne, Nikhil Sarda, Karthik Srinivasa Murthy, Yanqi Zhou, Christof Angermueller, Mike Burrows, Sudip Roy 0002, Ketan Mandke, Rezsa Farahani, Yu Emma Wang, Berkin Ilbeyi, Blake A. Hechtman, Bjarke Roune, Shen Wang, Yuanzhong Xu, Samuel J. Kaufman. 1-16 [doi]
- PolyGym: Polyhedral Optimizations as an Environment for Reinforcement LearningAlexander Brauckmann, Andrés Goens, Jerónimo Castrillón. 17-29 [doi]
- Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial AcceleratorsGeonhwa Jeong, Gokcen Kestor, Prasanth Chatarasi, Angshuman Parashar, Po-An Tsai, Sivasankaran Rajamanickam, Roberto Gioiosa, Tushar Krishna. 30-44 [doi]
- Polygeist: Raising C to Polyhedral MLIRWilliam S. Moses, Lorenzo Chelini, Ruizhe Zhao, Oleksandr Zinenko. 45-59 [doi]
- Program Lifting using Gray-Box BehaviorBruce Collie, Michael F. P. O'Boyle. 60-74 [doi]
- NLP-Fast: A Fast, Scalable, and Flexible System to Accelerate Large-Scale Heterogeneous NLP ModelsJoonsung Kim, Suyeon Hur, Eunbok Lee, Seungho Lee, Jangwoo Kim. 75-89 [doi]
- HERTI: A Reinforcement Learning-Augmented System for Efficient Real-Time Inference on Heterogeneous Embedded SystemsMyeonggyun Han, Woongki Baek. 90-102 [doi]
- X-Layer: Building Composable Pipelined Dataflows for Low-Rank ConvolutionsNaveen Vedula, Reza Hojabr, Ahmad Khonsari, Arrvindh Shriraman. 103-115 [doi]
- InnerSP: A Memory Efficient Sparse Matrix Multiplication Accelerator with Locality-Aware Inner Product ProcessingDaehyeon Baek, Soojin Hwang, Taekyung Heo, Daehoon Kim, Jaehyuk Huh. 116-128 [doi]
- Precision Batching: Bitserial Decomposition for Efficient Neural Network Inference on GPUsMaximilian Lam, Zachary Yedidia, Colby R. Banbury, Vijay Janapa Reddi. 129-141 [doi]
- AIBench Scenario: Scenario-Distilling AI BenchmarkingWanling Gao, Fei Tang, Jianfeng Zhan, Xu Wen, Lei Wang 0004, Zheng Cao, Chuanxin Lan, Chunjie Luo, Xiaoli Liu, Zihan Jiang. 142-158 [doi]
- Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference BottlenecksAmirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu. 159-172 [doi]
- SEER: A Time Prediction Model for CNNs from GPU Kernel's ViewGuodong Liu, Sa Wang, Yungang Bao. 173-185 [doi]
- Ultra Efficient Acceleration for De Novo Genome Assembly via Near-Memory ComputingMinxuan Zhou, Lingxi Wu, Muzhou Li, Niema Moshiri, Kevin Skadron, Tajana Rosing. 199-212 [doi]
- CBP: Coordinated management of cache partitioning, bandwidth partitioning and prefetch throttlingNadja Ramhöj Holtryd, Madhavan Manivannan, Per Stenström, Miquel Pericàs. 213-225 [doi]
- Invalidate or Update? Revisiting Coherence for Tomorrow's Cache HierarchiesMingcan Zhu, Amna Shahab, Antonios Katsarakis, Boris Grot. 226-241 [doi]
- Write Prediction for Persistent Memory SystemsSuyash Mahar, Sihang Liu 0001, Korakit Seemakhupt, Vinson Young, Samira Manabi Khan. 242-257 [doi]
- nuKSM: NUMA-aware Memory De-duplication on Multi-socket ServersAkash Panda, Ashish Panwar, Arkaprava Basu. 258-273 [doi]
- CoPlace: Effectively Mitigating Cache Conflicts in Modern CloudsXiaowei Shang, Weiwei Jia, Jianchen Shan, Xiaoning Ding. 274-288 [doi]
- Dryadic: Flexible and Fast Graph Pattern Matching at ScaleDaniel Mawhirter, Sam Reinehr, Wei Han, Noah Fields, Miles Claver, Connor Holmes, Jedidiah McClurg, Tongping Liu, Bo Wu 0002. 289-303 [doi]
- Skywalker: Efficient Alias-Method-Based Graph Sampling and Random Walk on GPUsPengyu Wang 0003, Chao Li 0009, Jing Wang, Taolei Wang, Lu Zhang, Jingwen Leng, Quan Chen 0002, Minyi Guo. 304-317 [doi]
- SumPA: Efficient Pattern-Centric Graph Mining with Pattern AbstractionChuangyi Gui, Xiaofei Liao, Long Zheng 0003, Pengcheng Yao, Qinggang Wang, Hai Jin 0001. 318-330 [doi]
- SURFNet: Super-Resolution of Turbulent Flows with Transfer Learning using Small DatasetsOctavi Obiols-Sales, Abhinav Vishnu, Nicholas Malaya, Aparna Chandramowlishwaran. 331-344 [doi]
- Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp ShufflesSultan Durrani, Muhammad Saad Chughtai, Mert Hidayetoglu, Rashid Tahir, Abdul Dakkak, Lawrence Rauchwerger, Fareed Zaffar, Wen-mei W. Hwu. 345-355 [doi]