Abstract is missing.
- Enabling Spill-Free Compilation via Affine-Based Live Range Reduction OptimizationPrasanth Chatarasi, Alex Gatea, Wei Wang 0333, Chris Bowler, Shubham Jain 0004, Masoud Ataei Jaliseh, Nicole Khoun, Alberto Mannari, Bardia Mahjour, Viji Srinivasan, Swagath Venkataramani. 1-13 [doi]
- GRANII: Selection and Ordering of Primitives in GRAph Neural Networks using Input InspectionDamitha Lenadora, Vimarsh Sathia, Gerasimos Gerogiannis, Serif Yesil, Josep Torrellas, Charith Mendis. 14-27 [doi]
- Fast Autoscheduling for Sparse ML FrameworksBobby Yan, Alexander J. Root, Trevor Gale, David Broman, Fredrik Kjolstad. 28-43 [doi]
- Eliminating Redundancy: Ultra-compact Code Generation for Programmable Dataflow AcceleratorsPrasanth Chatarasi, Alex Gatea, Bardia Mahjour, Jintao Zhang, Alberto Mannari, Chris Bowler, Shubham Jain 0004, Masoud Ataei Jaliseh, Nicole Khoun, Kamlesh Kumar, Viji Srinivasan, Swagath Venkataramani. 44-56 [doi]
- PriTran: Privacy-Preserving Inference for Transformer-Based Language Models under Fully Homomorphic EncryptionYuechen Mu, Guangli Li, Shiping Chen 0001, Jingling Xue. 57-69 [doi]
- FHEFusion: Enabling Operator Fusion in FHE Compilers for Depth-Efficient DNN InferenceTianxiang Sui, Jianxin Lai, Long Li 0015, Peng Yuan, Yan Liu 0082, Qing Zhu 0008, Xiaojing Zhang, Linjie Xiao, Mingzhe Zhang, Jingling Xue. 70-83 [doi]
- Towards Path-Aware Coverage-Guided FuzzingGiacomo Priamo, Daniele Cono D'Elia, Mathias Payer, Leonardo Querzoni. 84-97 [doi]
- SecSwift, a Compiler-Based Framework for Software Countermeasures in CybersecurityFrançois de Ferrière, Yves Janin, Sirine Mechmech. 98-108 [doi]
- Partial-Evaluation Templates: Accelerating Partial Evaluation with Pre-compiled TemplatesFlorian Huemer, Aleksandar Prokopec, David Leopoldseder, Raphael Mosaner, Hanspeter Mössenböck. 109-122 [doi]
- Pyls: Enabling Python Hardware Synthesis with Dynamic Polymorphism via LCRS EncodingBolei Tong, Yongyan Fang, Chaorui Wang, Qingan Li, Jingling Xue, Mengting Yuan 0001. 123-135 [doi]
- SkeleShare: Algorithmic Skeletons and Equality Saturation for Hardware Resource SharingJonathan Van der Cruysse, Tzung-Han Juang, Shakiba Bolbolian Khah, Christophe Dubach. 136-149 [doi]
- Ember: A Compiler for Embedding Operations on Decoupled Access-Execute ArchitecturesMarco Siracusa, Olivia Hsu, Víctor Soria Pardos, Joshua Randall 0001, Arnaud Grasset, Eric Biscondi, Douglas J. Joseph, Randy Allen, Fredrik Kjolstad, Miquel Moretó Planas, Adrià Armejach. 150-163 [doi]
- Flow-Graph-Aware Tiling and Rescheduling for Memory-Efficient On-Device InferenceYeonoh Jeong, Taehyeong Park 0001, Yongjun Park 0001. 164-175 [doi]
- VFlatten: Selective Value-Object Flattening using Hybrid Static and Dynamic AnalysisArjun H. Kumar, Bhavya Hirani, Hang Shao, Tobi Ajila, Vijay Sundaresan, Daryl Maier, Manas Thakur. 176-187 [doi]
- FRUGAL: Pushing GPU Applications beyond Memory LimitsLingqi Zhang 0001, Tengfei Wang, Jiajun Huang 0001, Chen Zhuang, Ivan R. Ivanov, Peng Chen 0035, Toshio Endo, Mohamed Wahib. 188-201 [doi]
- Automatic Data Enumeration for Fast CollectionsTommy McMichen, Simone Campanoni. 202-215 [doi]
- FORTE: Online DataFrame Query OptimizerYoonho Choi, Kyoungtae Lee, Minji Kim, Hyungsoo Jung 0003, Hyojin Sung. 216-227 [doi]
- LEGO: A Layout Expression Language for Code Generation of Hierarchical MappingAmir Mohammad Tavakkoli, Cosmin E. Oancea, Mary Hall. 228-241 [doi]
- Pushing Tensor Accelerators beyond MatMul in a User-Schedulable LanguageYihong Zhang, Derek K. Gerstmann, Andrew Adams, Maaz Bin Safeer Ahmad. 242-254 [doi]
- Tawa: Automatic Warp Specialization for Modern GPUs with Asynchronous ReferencesHongzheng Chen, Bin Fan, Alexander Collins, Bastian Hagedorn, Evghenii Gaburov, Masahiro Masuda, Matthew Brookhart, Chris Sullivan, Jason Knight, Zhiru Zhang, Vinod Grover. 255-267 [doi]
- Dependence-Driven, Scalable Quantum Circuit Mapping with Affine AbstractionsMarouane Benbetka, Merwan Bekkar, Riyadh Baghdadi, Martin Kong. 268-280 [doi]
- Space-Time Optimisations for Early Fault-Tolerant Quantum ComputationSanaa Sharma, Prakash Murali. 281-294 [doi]
- OpenQudit: Extensible and Accelerated Numerical Quantum Compilation via a JIT-Compiled DSLEd Younis. 295-305 [doi]
- Selene: Cross-Level Barrier-Free Pipelining for Irregular Nested Loops in High-Level SynthesisSungwoo Yun, Seonyoung Cheon, Dongkwan Kim 0002, Heelim Choi, Kunmo Jeong, Chan Lee, Yongwoo Lee 0001, Hanjun Kim 0001. 306-318 [doi]
- Enabling Automatic Compiler-Driven Vectorization of TransformersShreya Alladi, Alberto Ros 0001, Alexandra Jimborean. 319-333 [doi]
- Unlocking Python Multithreading Capabilities using OpenMP-Based Programming with OMP4PyCésar Piñeiro, Juan Carlos Pichel. 334-347 [doi]
- The Parallel-Semantics Program Dependence Graph for Parallel OptimizationYian Su, Brian Homerding, Haocheng Gao, Federico Sossai, Yebin Chon, David I. August, Simone Campanoni. 348-361 [doi]
- From Threads to Tiles: T2T, a Compiler for CUDA-to-NPU Translation via 2D VectorizationShuaijiang Li, Jiacheng Zhao, Ying Liu 0055, Shuoming Zhang, Lei Chen, Yijin Li, Yangyu Zhang, Zhicheng Li, Runyu Zhou, Xiyu Shi, Chunwei Xia, Yuan Wen, Xiaobing Feng 0002, Huimin Cui. 362-374 [doi]
- Binary Diffing via Library SignaturesAndrei Rimsa, Anderson Faustino da Silva, Camilo Santana, Fernando Magno Quintão Pereira. 375-389 [doi]
- BIT: Empowering Binary Analysis through the LLVM ToolchainPuzhuo Liu, Peng Di, Jingling Xue, Yu Jiang 0001. 390-402 [doi]
- Dr.avx: A Dynamic Compilation System for Seamlessly Executing Hardware-Unsupported Vectorization InstructionsYue Tang, Mianzhi Wu, Yufeng Li, Haoyu Liao, Jianmei Guo, Bo Huang 0002. 403-415 [doi]
- Practical: Are Abstract-Interpreter Baseline JITs Worth It? An Empirical Evaluation through MetacompilationNahuel Palumbo, Guillermo Polito, Stéphane Ducasse, Pablo Tesone. 416-426 [doi]
- TPDE: A Fast Adaptable Compiler Back-End FrameworkTobias Schwarz, Tobias Kamm, Alexis Engelke. 427-439 [doi]
- Synthesizing Instruction Selection Back-Ends from ISA Specifications Made PracticalFlorian Drescher, Alexis Engelke. 440-452 [doi]
- SparseX: Synergizing GPU Libraries for Sparse Matrix Multiplication on Heterogeneous ProcessorsRuifeng Zhang 0008, Xiangwei Wang, Ang Li 0006, Xipeng Shen. 453-465 [doi]
- Compilation of Generalized Matrix Chains with Symbolic SizesFrancisco López, Lars Karlsson, Paolo Bientinesi. 466-478 [doi]
- TRACE4J: A Lightweight, Flexible, and Insightful Performance Tracing Tool for JavaHaide He, Pengfei Su. 479-492 [doi]
- Proton: Towards Multi-level, Adaptive Profiling for TritonKeren Zhou 0001, Tianle Zhong, Hao Wu 0077, Jihyeong Lee, Yue Guan 0003, Yufei Ding 0001, Corbin Robeck, Yuanwei Fang, Jeff Niu, Philippe Tillet. 493-506 [doi]
- On the Precision of Dynamic Program Fingerprints Based on Performance CountersAnderson Faustino da Silva, Marcelo Borges Nogueira, Sérgio Queiroz de Medeiros, Jerónimo Castrillón, Fernando Magno Quintão Pereira. 507-519 [doi]
- PASTA: A Modular Program Analysis Tool Framework for AcceleratorsMao Lin, Hyeran Jeon, Keren Zhou 0001. 520-534 [doi]
- PIP: Making Andersen's Points-to Analysis Sound and Practical for Incomplete C ProgramsHåvard Rognebakke Krogstie, Helge Bahmann, Magnus Själander, Nico Reissmann. 535-547 [doi]
- Thinking Fast and Correct: Automated Rewriting of Numerical Code through Compiler AugmentationSiyuan Brant Qian, Vimarsh Sathia, Ivan R. Ivanov, Jan Hückelheim, Paul Hovland, William S. Moses. 548-562 [doi]
- PolyUFC: Polyhedral Compilation Meets Roofline Analysis for Uncore Frequency CappingNilesh Rajendra Shah, M. V. V. S. Manoj Kumar, Dhairya Baxi, Ramakrishna Upadrasta. 563-576 [doi]
- Accelerating App Recompilation across Android System Updates by Code ReusingHongtao Wu, Yu Chen, Mengfei Xie, Futeng Yang, Jun Yan, Jiang Ma, Jianming Fu, Chun Jason Xue, Qingan Li. 577-588 [doi]
- QIGen: A Kernel Generator for Inference on Nonuniformly Quantized Large Language ModelsTommaso Pegolotti, Dan Alistarh, Markus Püschel. 589-602 [doi]
- DyPARS: Dynamic-Shape DNN Optimization via Pareto-Aware MCTS for Graph VariantsHao Qian, Guangli Li, Qiuchu Yu, Xueying Wang 0003, Jingling Xue. 603-616 [doi]
- Compiler-Runtime Co-operative Chain of Verification for LLM-Based Code OptimizationHyunho Kwon, Sanggyu Shin, Ju Min Lee, Hoyun Youm, Seungbin Song, Seongho Kim, Hanwoong Jung, SeungWon Lee, Hanjun Kim 0001. 617-629 [doi]
- Hexcute: A Compiler Framework for Automating Layout Synthesis in GPU ProgramsXiao Zhang, Yaoyao Ding, Bolin Sun, Yang Hu, Tatiana Shpeisman, Gennady Pekhimenko. 630-643 [doi]
- Multidirectional Propagation of Sparsity Information across Tensor SlicesKaio Henrique Andrade Ananias, Danila Seliayeu, José Nelson Amaral, Fernando Magno Quintão Pereira. 644-656 [doi]
- Synthesizing Specialized Sparse Tensor Accelerators for FPGAs via High-Level Functional AbstractionsHamza Javed, Christophe Dubach. 657-669 [doi]
- Progressive Low-Precision Approximation of Tensor Operators on GPUs: Enabling Greater Trade-Offs between Performance and AccuracyFan Luo 0003, Guangli Li, Zhaoyang Hao, Xueying Wang 0003, Xiaobing Feng 0002, Huimin Cui, Jingling Xue. 670-682 [doi]
- Tensor Program Superoptimization through Cost-Guided Symbolic Program SynthesisAlexander Brauckmann, Aarsh Chaube, José Wesley de S. Magalhães, Elizabeth Polgreen, Michael F. P. O'Boyle. 683-695 [doi]
- A Reinforcement Learning Environment for Automatic Code Optimization in the MLIR CompilerMohammed Tirichine, Nassim Ameur, Nazim Bendib, Iheb Nassim Aouadj, Djad Bouchama, Rafik Bouloudene, Riyadh Baghdadi. 696-710 [doi]
- Towards Threading the Needle of Debuggable Optimized BinariesCristian Assaiante, Simone Di Biasio, Snehasish Kumar, Giuseppe Antonio Di Luna, Daniele Cono D'Elia, Leonardo Querzoni. 711-725 [doi]
- Compiler-Assisted Instruction FusionRavikiran Ravindranath Reddy, Sawan Singh, Arthur Perais, Alberto Ros 0001, Alexandra Jimborean. 726-739 [doi]
- LLM-VeriOpt: Verification-Guided Reinforcement Learning for LLM-Based Compiler OptimizationXiangxin Fang, Jiaqin Kang, Rodrigo Rocha, Sam Ainsworth 0001, Lev Mukhanov. 740-755 [doi]