Abstract is missing.
- ReACT: Redundancy-Aware Code Generation for Tensor ExpressionsTong Zhou, Ruiqin Tian, Rizwan A. Ashraf, Roberto Gioiosa, Gokcen Kestor, Vivek Sarkar. 1-13 [doi]
- Com-CAS: Effective Cache Apportioning under Compiler GuidanceBodhisatwa Chatterjee, Sharjeel Khan, Santosh Pande. 14-27 [doi]
- Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code GenerationPerry Gibson, José Cano 0001. 28-39 [doi]
- Slice-and-Forge: Making Better Use of Caches for Graph Convolutional Network AcceleratorsMingi Yoo, Jaeyong Song, Hyeyoon Lee, Jounghoo Lee, Namhyung Kim, Youngsok Kim, Jinho Lee. 40-53 [doi]
- GNNear: Accelerating Full-Batch Training of Graph Neural Networks with near-Memory ProcessingZhe Zhou, Cong Li, Xuechao Wei, Xiaoyang Wang 0006, Guangyu Sun 0003. 54-68 [doi]
- T-GCN: A Sampling Based Streaming Graph Neural Network System with Hybrid ArchitectureChengying Huan, Shuaiwen Leon Song, Yongchao Liu, Heng Zhang, Hang Liu 0001, Charles He, Kang Chen, Jinlei Jiang, Yongwei Wu. 69-82 [doi]
- Optimizing Aggregate Computation of Graph Neural Networks with on-GPU Interpreter-Style ProgrammingZhuoran Ji, Cho-Li Wang. 83-95 [doi]
- FlatPack: Flexible Compaction of Compressed MemoryAlbin Eldstål-Ahrens, Angelos Arelakis, Ioannis Sourdis. 96-108 [doi]
- Pavise: Integrating Fault Tolerance Support for Persistent Memory ApplicationsHan Jie Qiu, Sihang Liu 0001, Xinyang Song, Samira Manabi Khan, Gennady Pekhimenko. 109-123 [doi]
- Efficient Atomic Durability on eADR-Enabled Persistent MemoryTaiyu Zhou, Yajuan Du, Fan Yang, Xiaojian Liao, Youyou Lu. 124-134 [doi]
- Probing the Efficacy of Hardware-Aware Weight Pruning to Optimize the SpMM Routine on Ampere GPUsRoberto L. Castro, Diego Andrade, Basilio B. Fraguela. 135-147 [doi]
- Squaring the circle: Executing Sparse Matrix Computations on FlexTPU - A TPU-Like ProcessorXin He, Kuan-Yu Chen, Siying Feng, Hun-Seok Kim, David T. Blaauw, Ronald G. Dreslinski, Trevor N. Mudge. 148-159 [doi]
- Custom High-Performance Vector Code Generation for Data-Specific Sparse ComputationsMarcos Horro, Louis-Noël Pouchet, Gabriel Rodríguez 0001, Juan Touriño. 160-171 [doi]
- Batched Graph Community Detection on GPUsHan-Yi Chou, Sayan Ghosh. 172-184 [doi]
- SampleMine: A Framework for Applying Random Sampling to Subgraph Pattern Mining through Loop PerforationPeng Jiang 0004, Yihua Wei, Jiya Su, Rujia Wang, Bo Wu. 185-197 [doi]
- Decoupling Schedule, Topology Layout, and Algorithm to Easily Enlarge the Tuning Space of GPU Graph ProcessingShinnung Jeong, Yongwoo Lee, Jaeho Lee, Heelim Choi, Seungbin Song, Jinho Lee, Youngsok Kim, Hanjun Kim 0001. 198-210 [doi]
- Tiered Hashing: Revamping Hash Indexing under a Unified Memory-Storage HierarchyJian Zhou, Jianfeng Wu, Weizhou Huang, You Zhou, Fei Wu, Liu Shi, Xiaoyi Zhang, Kun Wang, Feng Zhu, Shu Li. 211-222 [doi]
- Understanding and Reaching the Performance Limit of Schedule Tuning on Stable Synchronization DeterminismQi Zhao 0003, Zhengyi Qiu, Shudi Shao, Xinning Hui, Hassan Ali Khan, Guoliang Jin. 223-238 [doi]
- VoxelCache: Accelerating Online Mapping in Robotics and 3D Reconstruction TasksSankeerth Durvasula, Raymond Kiguru, Samarth Mathur, Jenny Xu, Jimmy Lin, Nandita Vijaykumar. 239-251 [doi]
- Effective Performance Modeling and Domain-Specific Compiler Optimization of CNNs for GPUsYufan Xu, Qiwei Yuan, Erik Curtis Barton, Rui Li 0033, P. Sadayappan, Aravind Sukumaran-Rajam. 252-264 [doi]
- High-Performance Architecture Aware Sparse Convolutional Neural Networks for GPUsLizhi Xiang, P. Sadayappan, Aravind Sukumaran-Rajam. 265-278 [doi]
- Weightless Neural Networks for Efficient Edge InferenceZachary Susskind, Aman Arora, Igor D. S. Miranda, Luis Armando Quintanilla Villon, Rafael Fontella Katopodis, Leandro Santiago de Araújo, Diego Leonel Cadette Dutra, Priscila M. V. Lima, Felipe M. G. França, Maurício Breternitz, Lizy K. John. 279-290 [doi]
- Q-gym: An Equality Saturation Framework for DNN Inference Exploiting Weight RepetitionCheng Fu, Hanxian Huang, Bram Wasti, Chris Cummins, Riyadh Baghdadi, Kim M. Hazelwood, Yuandong Tian, Jishen Zhao, Hugh Leather. 291-303 [doi]
- Locality-Aware Optimizations for Improving Remote Memory Latency in Multi-GPU SystemsLeul Belayneh, Haojie Ye, Kuan-Yu Chen, David T. Blaauw, Trevor N. Mudge, Ronald G. Dreslinski, Nishil Talati. 304-316 [doi]
- GPUPool: A Holistic Approach to Fine-Grained GPU Sharing in the CloudXiaodan Serina Tan, Pavel Golikov, Nandita Vijaykumar, Gennady Pekhimenko. 317-332 [doi]
- NaviSim: A Highly Accurate GPU Simulator for AMD RDNA GPUsYuhui Bao, Yifan Sun 0002, Zlatan Feric, Michael Tian Shen, Micah Weston, José L. Abellán, Trinayan Baruah, John Kim, Ajay Joshi, David R. Kaeli. 333-345 [doi]
- mu-grind: A Framework for Dynamically Instrumenting HLS-Generated RTLParmida Vahdatniya, Amirali Sharifian, Reza Hojabr, Arrvindh Shriraman. 346-358 [doi]
- Athena: An Early-Fetch Architecture to Reduce on-Chip Page Walk LatenciesSeyed Armin Vakil-Ghahani, Soheil Khadirsharbiyani, Jagadish B. Kotra, Mahmut T. Kandemir. 359-371 [doi]
- DSDP: Dual Stream Data PrefetcherMingjian He, Hua Wang, Ke Zhou, Kaichao Cui, Huabing Yan, Chang Guo, Rongfeng He. 372-383 [doi]
- Efficient Task-Mapping of Parallel Applications Using a Space-Filling CurveOh-Kyoung Kwon, Ji Hoon Kang 0002, Seungchul Lee, Wonjung Kim, Junehwa Song. 384-397 [doi]
- Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocksMahyar Emami, Endri Bezati, Jörn W. Janneck, James R. Larus. 398-411 [doi]
- HBMax: Optimizing Memory Efficiency for Parallel Influence Maximization on Multicore ArchitecturesXinyu Chen, Marco Minutoli, Jiannan Tian, Mahantesh Halappanavar, Ananth Kalyanaraman, Dingwen Tao. 412-425 [doi]
- Optimizing Regular Expressions via Rewrite-Guided SynthesisJedidiah McClurg, Miles Claver, Jackson Garner, Jake Vossen, Jordan Schmerge, Mehmet E. Belviranli. 426-438 [doi]
- Combining Run-Time Checks and Compile-Time Analysis to Improve Control Flow Auto-VectorizationBangtian Liu, Avery Laird, Wai-Hung Tsang, Bardia Mahjour, Maryam Mehri Dehnavi. 439-450 [doi]
- Parallelizing Neural Network Models Effectively on GPU by Implementing Reductions AtomicallyJie Zhao 0002, Cédric Bastoul, Yanzhi Yi, Jiahui Hu, Wang Nie, Renwei Zhang, Zhen Geng, Chong Li, Thibaut Tachon, Zhiliang Gan. 451-466 [doi]
- GPU Adaptive In-situ Parallel Analytics (GAP)Haoyuan Xing, Gagan Agrawal, Rajiv Ramnath. 467-480 [doi]
- A GPU Multiversion B-TreeMuhammad A. Awad, Serban D. Porumbescu, John D. Owens. 481-493 [doi]
- Breaking the Vendor Lock: Performance Portable Programming through OpenMP as Target Independent Runtime LayerJohannes Doerfert, Marc Jasper, Joseph Huber, Khaled Abdelaal, Giorgis Georgakoudis, Thomas Scogland, Konstantinos Parasyris. 494-504 [doi]
- BenchPress: A Deep Active Benchmark GeneratorFoivos Tsimpourlas, Pavlos Petoumenos, Min Xu, Chris Cummins, Kim M. Hazelwood, Ajitha Rajan, Hugh Leather. 505-516 [doi]
- Collage: Seamless Integration of Deep Learning Backends with Automatic PlacementByungSoo Jeon, Sunghyun Park, Peiyuan Liao, Sheng Xu, TianQi Chen, Zhihao Jia. 517-529 [doi]
- UPIR: Toward the Design of Unified Parallel Intermediate Representation for Parallel Programming ModelsAnjia Wang, Xinyao Yi, Yonghong Yan 0001. 530-531 [doi]
- FlexPointer: Fast Address Translation Based on Range TLB and Tagged PointersDongwei Chen, Dong Tong 0001, Chun Yang, Jiangfang Yi, Xu Cheng 0001. 532-533 [doi]
- Analysing Dataflow Programs with Causation TracesMichail Boulasikis, Flavius Gruian, Gareth Callanan, Jörn W. Janneck. 534-535 [doi]
- Massively Parallel Open Modification Spectral Library Searching with Hyperdimensional ComputingJaeyoung Kang 0001, Weihong Xu, Wout Bittremieux, Tajana Rosing. 536-537 [doi]
- Improving Convolution via Cache Hierarchy Tiling and Reduced PackingVictor Ferrari, Rafael C. F. Sousa, Marcio Pereira, João P. L. de Carvalho, José Nelson Amaral, Guido Araujo. 538-539 [doi]
- A Thermal-Aware Data Replica Placement Strategy for Data-Intensive Data CentersJie Li, Yuhui Deng, Zhaorui Wu, Shujie Pang. 540-541 [doi]
- Towards Supporting Semiring in MLIR-Based COMET CompilerLuanzheng Guo, Rizwan A. Ashraf, Ryan D. Friese, Gokcen Kestor. 542-543 [doi]
- MLIR Loop Optimizations for High-Level Synthesis: A Case StudySerena Curzel, Sofija Jovic, Michele Fiorito, Antonino Tumeo, Fabrizio Ferrandi. 544-545 [doi]
- An Architecture for Resilient Federated Learning through Parallel RecognitionJeongeun Kim, Young Woo Jeong, Su-Yeon Jang, Seung Eun Lee. 546-547 [doi]
- A Specialized BTB Organization for ServersTruls Asheim, Boris Grot, Rakesh Kumar 0003. 548-549 [doi]