Abstract is missing.
- CELLO: Compiler-Assisted Efficient Load-Load Ordering in Data-Race-Free RegionsSawan Singh, Josué Feliu, Manuel E. Acacio, Alexandra Jimborean, Alberto Ros. 1-13 [doi]
- Automatic Code Generation for High-Performance Graph AlgorithmsZhen Peng, Rizwan A. Ashraf, Luanzheng Guo, Ruiqin Tian, Gokcen Kestor. 14-26 [doi]
- UWOmppro: UWOmp++ with Point-to-Point Synchronization, Reduction and SchedulesAditya Agrawal, V. Krishna Nandivada. 27-38 [doi]
- mlirSynth: Automatic, Retargetable Program Raising in Multi-Level IR Using Program SynthesisAlexander Brauckmann, Elizabeth Polgreen, Tobias Grosser, Michael F. P. O'Boyle. 39-50 [doi]
- Drishyam: An Image is Worth a Data PrefetcherShubdeep Mohapatra, Biswabandan Panda. 51-61 [doi]
- HugeGPT: Storing Guest Page Tables on Host Huge Pages to Accelerate Address TranslationWeiwei Jia 0001, Jiyuan Zhang 0003, Jianchen Shan, Yiming Du, Xiaoning Ding, Tianyin Xu. 62-73 [doi]
- PreFlush: Lightweight Hardware Prediction Mechanism for Cache Line Flush and WritebackHussein Elnawawy, James Tuck, Gregory T. Byrd. 74-85 [doi]
- SDM: Sharing-Enabled Disaggregated Memory System with Cache Coherent Compute Express LinkHyokeun Lee, Kwanseok Choi, Hyuk-Jae Lee, Jaewoong Sim. 86-98 [doi]
- SimplePIM: A Software Framework for Productive and Efficient Processing-in-MemoryJinfan Chen, Juan Gómez-Luna, Izzat El Hajj, Yuxin Guo, Onur Mutlu. 99-111 [doi]
- Virtual PIM: Resource-Aware Dynamic DPU Allocation and Workload Scheduling Framework for Multi-DPU PIM ArchitectureDonghyeon Kim, Taehoon Kim, Inyong Hwang, Taehyeong Park, Hanjun Kim 0001, Youngsok Kim, Yongjun Park. 112-123 [doi]
- Boustrophedonic Frames: Quasi-Optimal L2 Caching for Textures in GPUsDiya Joseph, Juan L. Aragón, Joan-Manuel Parcerisa, Antonio González 0001. 124-136 [doi]
- G-Sparse: Compiler-Driven Acceleration for Generalized Sparse Computation for Graph Neural Networks on Modern GPUsYue Jin, Chengying Huan, Heng Zhang, Yongchao Liu, Shuaiwen Leon Song, Rui Zhao, Yao Zhang, Changhua He, Wenguang Chen. 137-149 [doi]
- TSUNAMI: A GPU Implementation of the WFA AlgorithmGiulia Gerometta, Alberto Zeni, Marco D. Santambrogio. 150-161 [doi]
- Parallelizing Maximal Clique Enumeration on GPUsMohammad Almasri, Yen-Hsiang Chang, Izzat El Hajj, Rakesh Nagi, Jinjun Xiong, Wen-mei W. Hwu. 162-175 [doi]
- Accelerating Decision-Tree-Based Inference Through Adaptive ParallelizationJan van Lunteren. 176-186 [doi]
- Automatic Algorithm-Based Fault Tolerance (AABFT) of Stencil ComputationsLouis Narmour, Steven Derrien, Sanjay V. Rajopadhye. 187-198 [doi]
- Performance Characterization of Popular DNN Models on Out-of-Order CPUsPablo Prieto, Pablo Abad Fidalgo, José-Ángel Gregorio, Valentin Puente. 199-210 [doi]
- GraphMini: Accelerating Graph Pattern Matching Using Auxiliary GraphsJuelin Liu, Sandeep Polisetty, Hui Guan 0001, Marco Serafini. 211-224 [doi]
- Barad-dur: Near-Storage Accelerator for Training Large Graph Neural NetworksJiyoung An, Esmerald Aliaj, Sang-Woo Jun. 225-237 [doi]
- A Silicon Photonic Multi-DNN AcceleratorYuan Li 0029, Ahmed Louri, Avinash Karanth. 238-249 [doi]
- Architecture-Aware CurryingMahmut Taylan Kandemir, Gulsum Gudukbay Akbulut, Wonil Choi, Mustafa Karaköy. 250-264 [doi]
- SpecCheck: A Tool for Systematic Identification of Vulnerable Transient Execution in gem5Zack McKevitt, Ashutosh Trivedi, Tamara Silbergleit Lehman. 265-278 [doi]
- Separating Mechanism from Policy in STMYaodong Sheng, Ahmed Hassan, Michael F. Spear. 279-296 [doi]
- MBAPIS: Multi-Level Behavior Analysis Guided Program Interval Selection for Microarchitecture StudiesHongwei Cui, Yujie Cui, Honglan Zhan, Shuhao Liang, Xianhua Liu 0001, Chun Yang, Xu Cheng 0001. 297-308 [doi]
- INTERPRET: Inter-Warp Register Reuse for GPU Tensor CoreJae Seok Kwak, Myung Kuk Yoon, Ipoom Jeong, Seunghyun Jin, Won Woo Ro. 309-319 [doi]
- A CPU-FPGA Holistic Source-To-Source Compilation Approach for Partitioning and Optimizing C/C++ ApplicationsTiago Santos, João Bispo, João M. P. Cardoso. 320-322 [doi]
- Dynamic Allocation of Processor Cores to Graph Applications on Commodity ServersLucia Pons, Julio Sahuquillo, Timothy M. Jones 0001. 323-324 [doi]
- QeiHaN: An Energy-Efficient DNN Accelerator that Leverages Log Quantization in NDP ArchitecturesBahareh Khabbazan, Marc Riera, Antonio González 0001. 325-326 [doi]
- Quickloop: An Efficient, FPGA-Accelerated Exploration of Parameterized DNN AcceleratorsTayyeb Mahmood, Kashif Inayat, Jaeyong Chung. 327-328 [doi]
- Retargeting Applications for Heterogeneous Systems with the Tribble Source-to-Source FrameworkLuís Miguel Sousa, João Bispo, Nuno Paulino 0001. 329-331 [doi]
- SLIDEX: Sliding Window Extension for Image ProcessingRaúl Taranco, José María Arnau, Antonio González 0001. 332-334 [doi]
- Thread-to-Core Allocation in ARM Processors Building Synergistic PairsMarta Navarro, Josué Feliu, Salvador Petit, María Engracia Gómez, Julio Sahuquillo. 335-336 [doi]
- SparseFT: Sparsity-aware Fault Tolerance for Reliable CNN Inference on GPUsGwangeun Byeon, Seungtae Lee, Seongwook Kim, Yongjun Kim, Prashant J. Nair, Seokin Hong. 337-338 [doi]