Abstract is missing.
- A Fault-Tolerant Million Qubit-Scale Distributed Quantum ComputerJunpyo Kim, Dongmoon Min, Jungmin Cho, Hyeonseong Jeong, Ilkwon Byun, Junhyuk Choi, Juwon Hong, Jangwoo Kim. 1-19 [doi]
- A Journey of a 1, 000 Kernels Begins with a Single Step: A Retrospective of Deep Learning on GPUsMichael Davies, Ian McDougall, Selvaraj Anandaraj, Deep Machchhar, Rithik Jain, Karthikeyan Sankaralingam. 20-36 [doi]
- A Quantitative Analysis and Guidelines of Data Streaming Accelerator in Modern Intel Xeon Scalable ProcessorsReese Kuper, Ipoom Jeong, Yifan Yuan, Ren Wang 0001, Narayan Ranganathan, Nikhil Rao, Jiayu Hu, Sanjay Kumar, Philip Lantz, Nam Sung Kim. 37-54 [doi]
- Achieving Near-Zero Read Retry for 3D NAND Flash MemoryMin Ye, Qiao Li, Yina Lv, Jie Zhang, Tianyu Ren, Daniel Wen, Tei-Wei Kuo, Chun Jason Xue. 55-70 [doi]
- An Encoding Scheme to Enlarge Practical DNA Storage Capacity by Reducing Primer-Payload CollisionsYixun Wei, Bingzhe Li, David H. C. Du. 71-84 [doi]
- Atalanta: A Bit is Worth a "Thousand" Tensor ValuesAlberto Delmas Lascorz, Mostafa Mahmoud, Ali Hadi Zadeh, Milos Nikolic 0002, Kareem Ibrahim, Christina Giannoula, Ameer Abdelhadi, Andreas Moshovos. 85-102 [doi]
- AttAcc! Unleashing the Power of PIM for Batched Transformer-based Generative Model InferenceJaehyun Park 0006, Jaewan Choi, Kwanhee Kyung, Michael Jaemin Kim, Yongsuk Kwon, Nam Sung Kim, Jung Ho Ahn. 103-119 [doi]
- Avoiding Instruction-Centric Microarchitectural Timing Channels Via Binary-Code TransformationsMichael Flanders, Reshabh K. Sharma, Alexandra E. Michael, Dan Grossman, David Kohlbrenner. 120-136 [doi]
- BitPacker: Enabling High Arithmetic Efficiency in Fully Homomorphic Encryption AcceleratorsNikola Samardzic, Daniel Sánchez 0003. 137-150 [doi]
- BVAP: Energy and Memory Efficient Automata Processing for Regular Expressions with Bounded RepetitionsZiyuan Wen, Lingkun Kong, Alexis Le Glaunec, Konstantinos Mamouras, Kaiyuan Yang 0001. 151-166 [doi]
- Carat: Unlocking Value-Level Parallelism for Multiplier-Free GEMMsZhewen Pan 0001, Joshua San Miguel, Di Wu 0016. 167-184 [doi]
- CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory AcceleratorsSongyun Qu, Shixin Zhao, Bing Li, Yintao He, Xuyi Cai, Lei Zhang, Ying Wang. 185-200 [doi]
- CMC: Video Transformer Acceleration via CODEC Assisted Matrix CondensingZhuoran Song, Chunyu Qi, Fangxin Liu, Naifeng Jing, Xiaoyao Liang. 201-215 [doi]
- Codesign of quantum error-correcting codes and modular chiplets in the presence of defectsSophia Fuhui Lin, Joshua Viszlai, Kaitlin N. Smith, Gokul Subramanian Ravi, Charles Yuan, Frederic T. Chong, Benjamin J. Brown. 216-231 [doi]
- Compiling Loop-Based Nested Parallelism for Irregular WorkloadsYian Su, Mike Rainey, Nick Wanninger, Nadharm Dhiantravan, Jasper Liang, Umut A. Acar, Peter A. Dinda, Simone Campanoni. 232-250 [doi]
- Cornucopia Reloaded: Load Barriers for CHERI Heap Temporal SafetyNathaniel Wesley Filardo, Brett F. Gutstein, Jonathan Woodruff, Jessica Clarke 0001, Peter Rugg, Brooks Davis, Mark Johnston, Robert M. Norton, David Chisnall, Simon W. Moore, Peter G. Neumann, Robert N. M. Watson. 251-268 [doi]
- Design of Novel Analog Compute Paradigms with ArkYu-Neng Wang, Glenn Cowan, Ulrich Rührmair, Sara Achour. 269-286 [doi]
- Direct Memory Translation for Virtualized CloudsJiyuan Zhang 0003, Weiwei Jia 0001, Siyuan Chai, Peizhe Liu, Jongyul Kim 0001, Tianyin Xu. 287-304 [doi]
- Efficient Microsecond-scale Blind Scheduling with Tiny QuantaZhihong Luo, Sam Son, Dev Bali, Emmanuel Amaro, Amy Ousterhout, Sylvia Ratnasamy, Scott Shenker. 305-319 [doi]
- Eliminating Storage Management Overhead of Deduplication over SSD Arrays Through a Hardware/Software Co-DesignYuhong Wen, Xiaogang Zhao, You Zhou, Tong Zhang 0002, Shangjun Yang, Changsheng Xie, Fei Wu 0005. 320-335 [doi]
- Elivagar: Efficient Quantum Circuit Search for ClassificationSashwat Anagolum, Narges Alavisamani, Poulami Das 0005, Moinuddin K. Qureshi, Yunong Shi. 336-353 [doi]
- Energy Efficient Convolutions with Temporal ArithmeticRhys Gretsch, Peiyang Song, Advait Madhavan, Jeremy Lau, Timothy Sherwood. 354-368 [doi]
- ExeGPT: Constraint-Aware Resource Scheduling for LLM InferenceHyungjun Oh, Kihong Kim, Jaemin Kim, Sungkyun Kim, Junyeol Lee, Du-Seong Chang, Jiwon Seo 0002. 369-384 [doi]
- FaaSGraph: Enabling Scalable, Efficient, and Cost-Effective Graph Processing with Serverless ComputingYushi Liu 0003, Shixuan Sun, Zijun Li 0001, Quan Chen 0002, Sen Gao, Bingsheng He, Chao Li 0009, Minyi Guo. 385-400 [doi]
- FOCAL: A First-Order Carbon Model to Assess Processor SustainabilityLieven Eeckhout. 401-415 [doi]
- FPGA Technology Mapping Using Sketch-Guided Program SynthesisGus Henry Smith, Benjamin Kushigian, Vishal Canumalla, Andrew Cheung, Steven Lyubomirsky, Sorawee Porncharoenwase, René Just, Gilbert Louis Bernstein, Zachary Tatlock. 416-432 [doi]
- GIANTSAN: Efficient Memory Sanitization with Segment FoldingHao Ling, Heqing Huang, Chengpeng Wang, Yuandao Cai, Charles Zhang. 433-449 [doi]
- GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory StitchingCong Guo 0003, Rui Zhang, Jiale Xu, Jingwen Leng, Zihan Liu, Ziyu Huang, Minyi Guo, Hao Wu, Shouren Zhao, Junping Zhao, Ke Zhang. 450-466 [doi]
- Grafu: Unleashing the Full Potential of Future Value Computation for Out-of-core Synchronous Graph ProcessingTsun-Yu Yang, Cale England, Yi Li, Bingzhe Li, Ming-Chang Yang. 467-481 [doi]
- Greybox Fuzzing for Concurrency TestingDylan Wolff, Zheng Shi, Gregory J. Duck, Umang Mathur 0001, Abhik Roychoudhury. 482-498 [doi]
- Heet: Accelerating Elastic Training in Heterogeneous Deep Learning ClustersZizhao Mo, Huanle Xu, Chengzhong Xu 0001. 499-513 [doi]
- Hydride: A Retargetable and Extensible Synthesis-based Compiler for Modern Hardware ArchitecturesAkash Kothari, Abdul Rafae Noor, Muchen Xu, Hassam Uddin, Dhruv Baronia, Stefanos Baziotis, Vikram S. Adve, Charith Mendis, Sudipta Sengupta. 514-529 [doi]
- In-Storage Domain-Specific Acceleration for Serverless ComputingRohan Mahapatra, Soroush Ghodrati, Byung Hoon Ahn, Sean Kinzer, Shu-Ting Wang, Hanyang Xu, Lavanya Karthikeyan, Hardik Sharma, Amir Yazdanbakhsh, Mohammad Alian, Hadi Esmaeilzadeh. 530-548 [doi]
- JUNO: Optimizing High-Dimensional Approximate Nearest Neighbour Search with Sparsity-Aware Algorithm and Ray-Tracing Core MappingZihan Liu, Wentao Ni, Jingwen Leng, Yu Feng, Cong Guo, Quan Chen, Chao Li, Minyi Guo, Yuhao Zhu 0001. 549-565 [doi]
- Kimbap: A Node-Property Map System for Distributed Graph AnalyticsHochan Lee, Roshan Dathathri, Keshav Pingali. 566-581 [doi]
- Last-Level Cache Side-Channel Attacks Are Feasible in the Modern Public CloudZirui Neil Zhao, Adam Morrison 0001, Christopher W. Fletcher, Josep Torrellas. 582-600 [doi]
- LazyBarrier: Reconstructing Android IO Stack for Barrier-Enabled Flash StorageYuanyi Zhang, Heng Zhang, Wenbin Cao, Xing He, Daejun Park 0002, Jinyoung Choi, Sungjun Park. 601-615 [doi]
- LazyDP: Co-Designing Algorithm-Software for Scalable Training of Differentially Private Recommendation ModelsJuntaek Lim, Youngeun Kwon, Ranggi Hwang, Kiwan Maeng, G. Edward Suh, Minsoo Rhu. 616-630 [doi]
- Lifting Micro-Update Models from RTL for Formal Security AnalysisAdwait Godbole, Kevin Cheang, Yatin A. Manerkar, Sanjit A. Seshia. 631-648 [doi]
- Lightweight Fault Isolation: Practical, Efficient, and Secure Software SandboxingZachary Yedidia. 649-665 [doi]
- Marple: Scalable Spike Sorting for Untethered Brain-Machine InterfacingEugene Sha, Andy Liu, Kareem Ibrahim, Mostafa Mahmoud, Christina Giannoula, Ameer Abdelhadi, Andreas Moshovos. 666-682 [doi]
- MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks TrainingHongwu Peng, Xi Xie, Kaustubh Shivdikar, Md Amit Hasan, Jiahui Zhao, Shaoyi Huang, Omer Khan, David R. Kaeli, Caiwen Ding. 683-698 [doi]
- MECH: Multi-Entry Communication Highway for Superconducting Quantum ChipletsHezi Zhang, Keyi Yin, Anbang Wu, Hassan Shapourian, Alireza Shabani, Yufei Ding. 699-714 [doi]
- METAL: Caching Multi-level Indexes in Domain-Specific ArchitecturesAnagha Molakalmur Anil Kumar, Aditya Prasanna, Jonathan Balkind, Arrvindh Shriraman. 715-729 [doi]
- MicroVSA: An Ultra-Lightweight Vector Symbolic Architecture-based Classifier Library for Always-On Inference on Tiny MicrocontrollersNuntipat Narkthong, Shijin Duan, Shaolei Ren, Xiaolin Xu. 730-745 [doi]
- MulBERRY: Enabling Bit-Error Robustness for Energy-Efficient Multi-Agent Autonomous SystemsZishen Wan, Nandhini Chandramoorthy, Karthik Swaminathan, Pin-Yu Chen, Kshitij Bhardwaj, Vijay Janapa Reddi, Arijit Raychowdhury. 746-762 [doi]
- Multi-Dimensional and Message-Guided Fuzzing for Robotic Programs in Robot Operating SystemJia-Ju Bai, Haoxuan Song, Shimin Hu 0001. 763-778 [doi]
- One Gate Scheme to Rule Them All: Introducing a Complex Yet Reduced Instruction Set for Quantum ComputingJianxin Chen, Dawei Ding 0002, Weiyuan Gong, Cupjin Huang, Qi Ye. 779-796 [doi]
- Optimizing Dynamic-Shape Neural Networks on Accelerators via On-the-Fly Micro-Kernel PolymerizationFeng Yu 0019, Guangli Li, Jiacheng Zhao, Huimin Cui, Xiaobing Feng, Jingling Xue. 797-812 [doi]
- ORIANNA: An Accelerator Generation Framework for Optimization-based Robotic ApplicationsYuhui Hao, Yiming Gan, Bo Yu 0014, Qiang Liu 0011, Yinhe Han, Zishen Wan, Shaoshan Liu. 813-829 [doi]
- Palantir: Hierarchical Similarity Detection for Post-Deduplication Delta CompressionHongming Huang, Peng Wang, Qiang Su, Hong Xu, Chun Jason Xue, André Brinkmann. 830-845 [doi]
- PDIP: Priority Directed Instruction PrefetchingBhargav Reddy Godala, Sankara Prasad Ramesh, Gilles A. Pokam, Jared Stark, André Seznec, Dean M. Tullsen, David I. August. 846-861 [doi]
- Pentimento: Data Remanence in Cloud FPGAsColin Drewes, Olivia Weng, Andres Meza 0001, Alric Althoff, David Kohlbrenner, Ryan Kastner, Dustin Richmond. 862-878 [doi]
- PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-OptimizationCong Li, Zhe Zhou, Yang Wang, Fan Yang, Ting Cao, Mao Yang, Yun Liang 0001, Guangyu Sun 0003. 879-896 [doi]
- PIM-STM: Software Transactional Memory for Processing-In-Memory SystemsAndré Lopes, Daniel Castro, Paolo Romano 0002. 897-911 [doi]
- Plankton: Reconciling Binary Code and Debug InformationAnshunkang Zhou, Chengfeng Ye, Heqing Huang, Yuandao Cai, Charles Zhang. 912-928 [doi]
- PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph CompilationJason Ansel, Edward Z. Yang, Horace He, Natalia Gimelshein, Animesh Jain, Michael Voznesensky, Bin Bao, Peter Bell 0008, David Berard, Evgeni Burovski, Geeta Chauhan, Anjali Chourdia, Will Constable, Alban Desmaison, Zachary Devito, Elias Ellison, Will Feng, Jiong Gong, Michael Gschwind, Brian Hirsh, Sherlock Huang, Kshiteej Kalambarkar, Laurent Kirsch, Michael Lazos, Mario Lezcano, Yanbo Liang, Jason Liang, Yinghai Lu, C. K. Luk, Bert Maher, Yunjie Pan, Christian Puhrsch, Matthias Reso, Mark Saroufim, Marcos Yukio Siraichi, Helen Suk, Shunting Zhang, Michael Suo, Phil Tillet, Xu Zhao, Eikan Wang, Keren Zhou 0001, Richard Zou, Xiaodong Wang, Ajit Mathews, William Wen, Gregory Chanan, Peng Wu 0001, Soumith Chintala. 929-947 [doi]
- QuFEM: Fast and Accurate Quantum Readout Calibration Using the Finite Element MethodSiwei Tan, Liqiang Lu, Hanyu Zhang, Jia Yu, Congliang Lang, Yongheng Shang, Xinkui Zhao, Mingshuai Chen, Yun Liang 0001, Jianwei Yin. 948-963 [doi]
- RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input PreprocessingZheng Wang, Yuke Wang, Jiaqi Deng, Da Zheng, Ang Li 0006, Yufei Ding. 964-979 [doi]
- Red-QAOA: Efficient Variational Optimization through Circuit ReductionMeng Wang 0033, Bo Fang, Ang Li 0006, Prashant J. Nair. 980-998 [doi]
- 2: Robust Profile-Guided Runtime Prefetch GenerationYuxuan Zhang, Nathan Sobotka, Soyoon Park, Saba Jamilan, Tanvir Ahmed Khan, Baris Kasikci, Gilles A. Pokam, Heiner Litz, Joseph Devietti. 999-1013 [doi]
- Rubix: Reducing the Overhead of Secure Rowhammer Mitigations via Randomized Line-to-Row MappingAnish Saxena, Saurav Mathur, Moinuddin K. Qureshi. 1014-1028 [doi]
- SEER: Super-Optimization Explorer for High-Level Synthesis using E-graph RewritingJianyi Cheng, Samuel Coward, Lorenzo Chelini, Rafael Barbalho, Theo Drane. 1029-1044 [doi]
- SEVeriFast: Minimizing the root of trust for fast startup of SEV microVMsBenjamin Holmes 0002, Jason Waterman, Dan Williams. 1045-1060 [doi]
- sIOPMP: Scalable and Efficient I/O Protection for TEEsErhu Feng, Dahu Feng, Dong Du 0003, Yubin Xia, Wenbin Zheng, Siqi Zhao, Haibo Chen 0001. 1061-1076 [doi]
- Skip It: Take Control of Your Cache!Shashank Anand, Michal Friedman 0001, Michael Giardino, Gustavo Alonso. 1077-1094 [doi]
- Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model TrainingHongzheng Chen, Cody Hao Yu, Shuai Zheng, Zhen Zhang, Zhiru Zhang, Yida Wang 0003. 1095-1111 [doi]
- SpotServe: Serving Generative Large Language Models on Preemptible InstancesXupeng Miao, Chunan Shi, Jiangfei Duan, Xiaoli Xi, Dahua Lin, Bin Cui 0001, Zhihao Jia. 1112-1127 [doi]
- SUIT: Secure Undervolting with Instruction TrapsJonas Juffinger, Stepan Kalinin, Daniel Gruss, Frank Mueller 0001. 1128-1145 [doi]
- T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & CollectivesSuchita Pati, Shaizeen Aga, Mahzabeen Islam, Nuwan Jayasena, Matthew D. Sinclair. 1146-1164 [doi]
- Tandem Processor: Grappling with Emerging Operators in Neural NetworksSoroush Ghodrati, Sean Kinzer, Hanyang Xu, Rohan Mahapatra, Yoonsung Kim, Byung Hoon Ahn, Dong Kai Wang, Lavanya Karthikeyan, Amir Yazdanbakhsh, Jongse Park, Nam Sung Kim, Hadi Esmaeilzadeh. 1165-1182 [doi]
- TGLite: A Lightweight Programming Framework for Continuous-Time Temporal Graph Neural NetworksYufeng Wang, Charith Mendis. 1183-1199 [doi]
- Two-Face: Combining Collective and One-Sided Communication for Efficient Distributed SpMMCharles Block, Gerasimos Gerogiannis, Charith Mendis, Ariful Azad, Josep Torrellas. 1200-1217 [doi]
- Verifying Rust Implementation of Page Tables in a Software Enclave HypervisorZhenyang Dai, Shuang Liu, Vilhelm Sjöberg, Xupeng Li, Yu Chen 0004, Wenhao Wang, Yuekai Jia, Sean Noble Anderson, Laila Elbeheiry, Shubham Sondhi, Yu Zhang, Zhaozhong Ni, Shoumeng Yan, Ronghui Gu, Zhengyu He. 1218-1232 [doi]
- WASP: Workload-Aware Self-Replicating Page-Tables for NUMA ServersHongliang Qu, Zhibin Yu 0001. 1233-1249 [doi]
- What You Trace is What You Get: Dynamic Stack-Layout Recovery for Binary RecompilationFabian Parzefall, Chinmay Deshpande, Felicitas Hetzelt, Michael Franz. 1250-1263 [doi]