Abstract is missing.
- Direct Mind-Machine Teaming (Keynote)Abhishek Bhattacharjee. 1 [doi]
- Language Models: The Most Important Compute Challenge of Our Time (Keynote)Bryan Catanzaro. 2 [doi]
- ABNDP: Co-optimizing Data Access and Load Balance in Near-Data ProcessingBoyu Tian, Qihang Chen, Mingyu Gao 0001. 3-17 [doi]
- Accelerating Sparse Data Orchestration via Dynamic Reflexive TilingToluwanimi O. Odemuyiwa, Hadi Asghari Moghaddam, Michael Pellauer, Kartik Hegde, Po-An Tsai, Neal Clayton Crago, Aamer Jaleel, John D. Owens, Edgar Solomonik, Joel S. Emer, Christopher W. Fletcher. 18-32 [doi]
- APEX: A Framework for Automated Processing Element Design Space Exploration using Frequent Subgraph AnalysisJackson Melchert, Kathleen Feng, Caleb Donovick, Ross Daly, Ritvik Sharma, Clark W. Barrett, Mark A. Horowitz, Pat Hanrahan, Priyanka Raina. 33-45 [doi]
- Beyond Static Parallel Loops: Supporting Dynamic Task Parallelism on Manycore Architectures with Software-Managed Scratchpad MemoriesLin Cheng, Max Ruttenberg, Dai Cheol Jung, Dustin Richmond, Michael B. Taylor, Mark Oskin, Christopher Batten. 46-58 [doi]
- CaQR: A Compiler-Assisted Approach for Qubit Reuse through Dynamic CircuitFei Hua, Yuwei Jin, Yan Hao Chen, Suhas Vittal, Kevin Krsulich, Lev S. Bishop, John Lapeyre, Ali Javadi-Abhari, Eddy Z. Zhang. 59-71 [doi]
- CaT: A Solver-Aided Compiler for Packet-Processing PipelinesXiangyu Gao, Divya Raghunathan, Ruijie Fang, Tao Wang, Xiaotong Zhu, Anirudh Sivaraman, Srinivas Narayana, Aarti Gupta. 72-88 [doi]
- Characterizing and Optimizing End-to-End Systems for Private InferenceKarthik Garimella, Zahra Ghodsi, Nandan Kumar Jha, Siddharth Garg, Brandon Reagen. 89-104 [doi]
- Cohort: Software-Oriented Acceleration for Heterogeneous SoCsTianrui Wei, Nazerke Turtayeva, Marcelo Orenes-Vera, Omkar Lonkar, Jonathan Balkind. 105-117 [doi]
- Coyote: A Compiler for Vectorizing Encrypted Arithmetic CircuitsRaghav Malik, Kabir Sheth, Milind Kulkarni 0001. 118-133 [doi]
- DefT: Boosting Scalability of Deformable Convolution Operations on GPUsEdward Hanson, Mark Horton, Hai (Helen) Li, Yiran Chen 0001. 134-146 [doi]
- Disaggregated RAID Storage in Modern DatacentersJunyi Shu, Ruidong Zhu, Yun Ma 0002, Gang Huang, Hong Mei, Xuanzhe Liu, Xin Jin. 147-163 [doi]
- DrGPUM: Guiding Memory Optimization for GPU-Accelerated ApplicationsMao Lin, Keren Zhou, Pengfei Su. 164-178 [doi]
- Efficient Compactions between Storage Tiers with PrismDBAshwini Raina, Jianan Lu, Asaf Cidon, Michael J. Freedman. 179-193 [doi]
- Efficient Scheduler Live Update for Linux Kernel with ModularizationTeng Ma, Shanpei Chen, Yihao Wu, Erwei Deng, Zhuo Song, Quan Chen 0002, Minyi Guo. 194-207 [doi]
- eHDL: Turning eBPF/XDP Programs into Hardware Designs for the NICAlessandro Rivitti, Roberto Bifulco, Angelo Tulumello, Marco Bonola, Salvatore Pontarelli. 208-223 [doi]
- Exit-Less, Isolated, and Shared Access for Virtual MachinesKenichi Yasukata, Hajime Tazaki, Pierre-Louis Aublin. 224-237 [doi]
- Finding Unstable Code via Compiler-Driven Differential TestingShaohua Li, Zhendong Su 0001. 238-251 [doi]
- Flexagon: A Multi-dataflow Sparse-Sparse Matrix Multiplication Accelerator for Efficient DNN ProcessingFrancisco Muñoz-Martínez, Raveesh Garg, Michael Pellauer, José L. Abellán, Manuel E. Acacio, Tushar Krishna. 252-265 [doi]
- Going beyond the Limits of SFI: Flexible and Secure Hardware-Assisted In-Process Isolation with HFIShravan Narayan, Tal Garfinkel, Mohammadkazem Taram, Joey Rudek, Daniel Moghimi, Evan Johnson 0001, Chris Fallin, Anjo Vahldiek-Oberwagner, Michael LeMay, Ravi Sahita, Dean M. Tullsen, Deian Stefan. 266-281 [doi]
- GRACE: A Scalable Graph-Based Approach to Accelerating Recommendation Model InferenceHaojie Ye, Sanketh Vedula, Yuhan Chen, Yichen Yang 0005, Alex Bronstein, Ronald G. Dreslinski, Trevor N. Mudge, Nishil Talati. 282-301 [doi]
- Graphene: An IR for Optimized Tensor Computations on GPUsBastian Hagedorn, Bin Fan, Hanfeng Chen, Cris Cecka, Michael Garland, Vinod Grover. 302-313 [doi]
- Heron: Automatically Constrained High-Performance Library Generation for Deep Learning AcceleratorsJun Bi, Qi Guo 0001, Xiaqing Li, Yongwei Zhao, Yuanbo Wen, Yuxuan Guo, Enshuai Zhou, Xing Hu 0001, Zidong Du, Ling Li, Huaping Chen 0001, Tianshi Chen 0002. 314-328 [doi]
- Homunculus: Auto-Generating Efficient Data-Plane ML Pipelines for Datacenter NetworksTushar Swamy, Annus Zulfiqar, Luigi Nardi, Muhammad Shahbaz 0001, Kunle Olukotun. 329-342 [doi]
- Hyperscale Hardware Optimized Neural Architecture SearchSheng Li, Garrett Andersen, Tao Chen, Liqun Cheng, Julian Grady, Da Huang, Quoc V. Le, Andrew Li, Xin Li, Yang Li, Chen Liang, Yifeng Lu, Yun Ni, Ruoming Pang, Mingxing Tan, Martin Wicke, Gang Wu, Shengqi Zhu, Parthasarathy Ranganathan, Norman P. Jouppi. 343-358 [doi]
- Infinity Stream: Portable and Programmer-Friendly In-/Near-Memory FusionZhengrong Wang, Christopher Liu, Aman Arora, Lizy Kurian John, Tony Nowatzki. 359-375 [doi]
- In-Network Aggregation with Transport Transparency for Distributed TrainingShuo Liu, Qiaoling Wang, Junyi Zhang, Wenfei Wu, Qinliang Lin, Yao Liu, Meng Xu, Marco Canini, Ray C. C. Cheung, Jianfei He. 376-391 [doi]
- Kodan: Addressing the Computational Bottleneck in SpaceBradley Denby, Krishna Chintalapudi, Ranveer Chandra, Brandon Lucia, Shadi A. Noghabi. 392-403 [doi]
- LEGO: Empowering Chip-Level Functionality Plug-and-Play for Next-Generation IoT DevicesChong Zhang, Songfan Li, Yihang Song, Qianhe Meng, Minghua Chen, Yanxu Bai, Li Lu 0001, Hongzi Zhu. 404-418 [doi]
- Mapping Very Large Scale Spiking Neuron Network to Neuromorphic HardwareOuwen Jin, Qinghui Xing, Ying Li, ShuiGuang Deng, Shuibing He, Gang Pan 0001. 419-432 [doi]
- Mosaic Pages: Big TLB Reach with Small PagesKrishnan Gosakan, Jaehyun Han, William Kuszmaul, Ibrahim N. Mubarek, Nirjhar Mukherjee, Karthik Sriram, Guido Tagliavini, Evan West, Michael A. Bender, Abhishek Bhattacharjee, Alex Conway, Martin Farach-Colton, Jayneel Gandhi, Rob Johnson, Sudarsun Kannan, Donald E. Porter. 433-448 [doi]
- MP-Rec: Hardware-Software Co-design to Enable Multi-path RecommendationSamuel Hsia, Udit Gupta, Bilge Acun, Newsha Ardalani, Pan Zhong, Gu-Yeon Wei, David Brooks 0001, Carole-Jean Wu. 449-465 [doi]
- NosWalker: A Decoupled Architecture for Out-of-Core Random Walk ProcessingShuke Wang, Mingxing Zhang, Ke Yang, Kang Chen, Shaonan Ma, Jinlei Jiang, Yongwei Wu. 466-482 [doi]
- Occamy: Elastically Sharing a SIMD Co-processor across Multiple CPU CoresZhongcheng Zhang, Yan Ou, Ying Liu, Chenxi Wang, Yongbin Zhou, Xiaoyu Wang, Yuyang Zhang, Yucheng Ouyang, Jiahao Shan, Ying Wang, Jingling Xue, Huimin Cui, Xiaobing Feng 0002. 483-497 [doi]
- Persistent Memory Disaggregation for Cloud-Native Relational DatabasesChaoyi Ruan, Yingqiang Zhang, Chao Bi, Xiaosong Ma, Hao Chen, Feifei Li 0001, Xinjun Yang, Cheng Li, Ashraf Aboulnaga, Yinlong Xu. 498-512 [doi]
- PipeSynth: Automated Synthesis of Microarchitectural Axioms for Memory ConsistencyChase Norman, Adwait Godbole, Yatin A. Manerkar. 513-527 [doi]
- Protect the System Call, Protect (Most of) the World with BASTIONChristopher Jelesnianski, Mohannad Ismail, Yeongjin Jang, Dan Williams, Changwoo Min. 528-541 [doi]
- Re-architecting I/O Caches for Emerging Fast Storage DevicesMohammadamin Ajdari, Pouria Peykani Sani, Amirhossein Moradi, Masoud Khanalizadeh Imani, Amir Hossein Bazkhanei, Hossein Asadi 0001. 542-555 [doi]
- Reconfigurable Virtual Memory for FPGA-Driven I/OJoshua Landgraf, Matthew Giordano, Esther Yoon, Christopher J. Rossbach. 556-571 [doi]
- RepCut: Superlinear Parallel RTL Simulation with Replication-Aided PartitioningHaoyuan Wang, Scott Beamer. 572-585 [doi]
- Rosebud: Making FPGA-Accelerated Middlebox Development More PleasantMoein Khazraee, Alex Forencich, George C. Papen, Alex C. Snoeren, Aaron Schulman. 586-605 [doi]
- Simulator Independent Coverage for RTL Hardware LanguagesKevin Laeufer, Vighnesh Iyer, David Biancolin, Jonathan Bachrach, Borivoje Nikolic, Koushik Sen. 606-615 [doi]
- Skybox: Open-Source Graphic Rendering on Programmable RISC-V GPUsBlaise Tine, Varun Saxena, Santosh Srivatsan, Joshua R. Simpson, Fadi Alzammar, Liam Cooper, Hyesoon Kim. 616-630 [doi]
- Snape: Reliable and Low-Cost Computing with Mixture of Spot and On-Demand VMsFangkai Yang, Lu Wang, Zhenyu Xu, Jue Zhang, Liqun Li, Bo Qiao 0001, Camille Couturier, Chetan Bansal, Soumya Ram, Si-qin, Zhen Ma, Íñigo Goiri, Eli Cortez, Terry Yang, Victor Rühle, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang 0001. 631-643 [doi]
- Space-Efficient TREC for Enabling Deep Learning on MicrocontrollersJiesong Liu, Feng Zhang, Jiawei Guan, Hsin-Hsuan Sung, Xiaoguang Guo, Xiaoyong Du 0001, Xipeng Shen. 644-659 [doi]
- SparseTIR: Composable Abstractions for Sparse Compilation in Deep LearningZihao Ye, Ruihang Lai, Junru Shao, TianQi Chen, Luis Ceze. 660-678 [doi]
- SPLENDID: Supporting Parallel LLVM-IR Enhanced Natural Decompilation for Interactive DevelopmentZujun Tan, Yebin Chon, Michael Kruse, Johannes Doerfert, Ziyang Xu, Brian Homerding, Simone Campanoni, David I. August. 679-693 [doi]
- TeraHeap: Reducing Memory Pressure in Managed Big Data FrameworksIacovos G. Kolokasis, Giannos Evdorou, Shoaib Akram 0001, Christos Kozanitis, Anastasios Papagiannis, Foivos S. Zakkak, Polyvios Pratikakis, Angelos Bilas. 694-709 [doi]
- The Sparse Abstract MachineOlivia Hsu, Maxwell Strange, Ritvik Sharma, Jaeyeon Won, Kunle Olukotun, Joel S. Emer, Mark A. Horowitz, Fredrik Kjølstad. 710-726 [doi]
- Towards an Adaptable Systems Architecture for Memory Tiering at Warehouse-ScalePadmapriya Duraisamy, Wei Xu, Scott Hare, Ravi Rajwar, David E. Culler, Zhiyi Xu, Jianing Fan, Christopher Kennelly, Bill McCloskey, Danijela Mijailovic, Brian Morris, Chiranjit Mukherjee, Jingliang Ren, Greg Thelen, Paul Turner, Carlos Villavieja, Parthasarathy Ranganathan, Amin Vahdat. 727-741 [doi]
- TPP: Transparent Page Placement for CXL-Enabled Tiered-MemoryHasan Al Maruf, Hao Wang, Abhishek Dhanotia, Johannes Weiner, Niket Agarwal, Pallab Bhattacharya, Chris Petersen, Mosharaf Chowdhury, Shobhit O. Kanaujia, Prakash Chauhan. 742-755 [doi]
- Transparent Runtime Change Handling for Android AppsZizhan Chen, Zili Shao. 756-770 [doi]
- Untangle: A Principled Framework to Design Low-Leakage, High-Performance Dynamic Partitioning SchemesZirui Neil Zhao, Adam Morrison 0001, Christopher W. Fletcher, Josep Torrellas. 771-788 [doi]
- Verification of Nondeterministic Quantum ProgramsYuan Feng, Yingte Xu. 789-805 [doi]
- Vidi: Record Replay for Reconfigurable HardwareGefei Zuo, Jiacheng Ma, Andrew Quinn 0001, Baris Kasikci. 806-820 [doi]