Abstract is missing.
- Accelerating LLM Serving for Multi-turn Dialogues with Efficient Resource ManagementJinwoo Jeong, Jeongseob Ahn. 1-15 [doi]
- Affinity-based Optimizations for TFHE on Processing-in-DRAMKevin Nam, Heon Hui Jung, Hyunyoung Oh, Yunheung Paek. 16-31 [doi]
- AMuLeT: Automated Design-Time Testing of Secure Speculation CountermeasuresBo Fu, Leo Tenenbaum, David Adler, Assaf Klein, Arpit Gogia, Alaa R. Alameldeen, Marco Guarnieri, Mark Silberstein, Oleksii Oleksenko, Gururaj Saileshwar. 32-47 [doi]
- Aqua: Network-Accelerated Memory Offloading for LLMs in Scale-Up GPU DomainsAbhishek Vijaya Kumar, Gianni Antichi, Rachee Singh. 48-62 [doi]
- Be CIM or Be Memory: A Dual-mode-aware DNN Compiler for CIM AcceleratorsShixin Zhao, Yuming Li, Bing Li 0017, Yintao He, Mengdi Wang, Yinhe Han 0001, Ying Wang 0001. 63-78 [doi]
- BQSim: GPU-accelerated Batch Quantum Circuit Simulation using Decision DiagramShui Jiang, Yi-Hua Chung, Chih-Chun Chang, Tsung-Yi Ho, Tsung-Wei Huang. 79-94 [doi]
- Cascade: A Dependency-aware Efficient Training Framework for Temporal Graph Neural NetworkYue Dai 0005, Xulong Tang, Youtao Zhang. 95-110 [doi]
- CIPHERMATCH: Accelerating Homomorphic Encryption-Based String Matching via Memory-Efficient Data Packing and In-Flash ProcessingMayank Kabra, Rakesh Nadig, Harshita Gupta, Rahul Bera, Manos Frouzakis, Vamanan Arulchelvan, Yu Liang 0004, Haiyu Mao, Mohammad Sadrosadati, Onur Mutlu. 111-130 [doi]
- COMET: Towards Practical W4A4KV4 LLMs ServingLian Liu, Long Cheng 0003, Haimeng Ren, Zhaohui Xu, Yudong Pan, Mengdi Wang, Xiaowei Li 0001, Yinhe Han 0001, Ying Wang 0001. 131-146 [doi]
- Concurrency-Informed Orchestration for Serverless FunctionsQichang Liu, Yue Cheng 0001, Haiying Shen, Ao Wang, Bharathan Balaji. 147-161 [doi]
- Controlled Preemption: Amplifying Side-Channel Attacks from UserspaceYongye Zhu, Boru Chen, Zirui Neil Zhao, Christopher W. Fletcher. 162-177 [doi]
- CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited MemoryJiashun Suo, Xiaojian Liao, Limin Xiao, Li Ruan, Jinquan Wang, Xiao Su, Zhisheng Huo. 178-191 [doi]
- CTXNL: A Software-Hardware Co-designed Solution for Efficient CXL-Based Transaction ProcessingZhao Wang, Yiqi Chen, Cong Li, Yijin Guan, Dimin Niu, Tianchan Guan, Zhaoyang Du, Xingda Wei, Guangyu Sun 0003. 192-209 [doi]
- CXLfork: Fast Remote Fork over CXL FabricsChloe Alverti, Stratos Psomadakis, Burak Ocalan, Shashwat jaiswal, Tianyin Xu, Josep Torrellas. 210-226 [doi]
- Data Cache for Intermittent Computing Systems with Non-Volatile Main MemorySourav Mohapatra, Vito Kortbeek, Marco Antonio van Eerden, Jochem Broekhoff, Saad Ahmed, Przemyslaw Pawelczak. 227-243 [doi]
- Dynamic Partial Deadlock Detection and Recovery via Garbage CollectionGeorgian-Vlad Saioc, I-Ting Angelina Lee, Anders Møller, Milind Chabbi. 244-259 [doi]
- DynaX: Sparse Attention Acceleration with Dynamic X: M Fine-Grained Structured PruningXiao-Xiong, Zhaorui Chen, Yue Liang, Minghao Tian, Jiaxing Shang, Jiang Zhong, Dajiang Liu. 260-274 [doi]
- Einsum Trees: An Abstraction for Optimizing the Execution of Tensor ExpressionsAlexander Breuer, Mark Blacher, Max Engel, Joachim Giesen, Alexander Heinecke, Julien Klaus, Stefan Remke. 275-292 [doi]
- ElasticMiter: Formally Verified Dataflow Circuit RewritesAyatallah Elakhras, Jiahui Xu, Martin Erhart, Paolo Ienne, Lana Josipovic. 293-308 [doi]
- Embracing Imbalance: Dynamic Load Shifting among Microservice Containers in Shared ClustersShutian Luo, Jianxiong Liao, Chenyu Lin, Huanle Xu, Zhi Zhou, Chengzhong Xu 0001. 309-324 [doi]
- Enabling Efficient Mobile Tracing with BTraceJiawei Wang, Nian Liu, Arnau Casadevall-Saiz, Yutao Liu, Diogo Behrens, Ming Fu, Ning Jia, Hermann Härtig, Haibo Chen 0001. 325-338 [doi]
- Energy-aware Scheduling and Input Buffer Overflow Prevention for Energy-harvesting SystemsHarsh Desai, Xinye Wang, Brandon Lucia. 339-354 [doi]
- EXIST: Enabling Extremely Efficient Intra-Service Tracing Observability in DatacentersXinkai Wang 0003, Xiaofeng Hou, Chao Li 0009, Yuancheng Li, Du Liu, Guoyao Xu, Guodong Yang, Liping Zhang, Yuemin Wu, Xiaopeng Yuan, Quan Chen 0002, Minyi Guo. 355-372 [doi]
- Extended User Interrupts (xUI): Fast and Flexible Notification without PollingBerk Aydogmus, Linsong Guo, Danial Zuberi, Tal Garfinkel, Dean M. Tullsen, Amy Ousterhout, Kazem Taram. 373-389 [doi]
- Fat-Tree QRAM: A High-Bandwidth Shared Quantum Random Access Memory for Parallel QueriesShifan Xu, Alvin Lu, Yongshan Ding 0001. 390-406 [doi]
- FLEXPROF: Flexible, Side-Channel-Free Memory AccessJarrett Minton, Rajeev Balasubramonian. 407-420 [doi]
- FlexSP: Accelerating Large Language Model Training via Flexible Sequence ParallelismYujie Wang, Shiju Wang, Shenhan Zhu, Fangcheng Fu, Xinyi Liu, XueFeng Xiao, Huixia Li, Jiashi Li, Faming Wu, Bin Cui 0001. 421-436 [doi]
- Formalising CXL Cache CoherenceChengsong Tan, Alastair F. Donaldson, John Wickerson. 437-450 [doi]
- Generalizing Reuse Patterns for Efficient DNN on MicrocontrollersJiesong Liu, Bin Ren, Xipeng Shen. 451-466 [doi]
- Gigaflow: Pipeline-Aware Sub-Traversal Caching for Modern SmartNICsAnnus Zulfiqar, Ali Imran 0005, Venkat Kunaparaju, Ben Pfaff, Gianni Antichi, Muhammad Shahbaz 0001. 467-481 [doi]
- Hardware Sentinel: Protecting Software Applications from Hardware Silent Data CorruptionsRhea Dutta, Harish Dattatraya Dixit, Rik van Riel, Gautham Vunnam, Sriram Sankar. 482-497 [doi]
- Harmonia: A Unified Framework for Heterogeneous FPGA Acceleration in the CloudLuyang Li, Heng Pan, Xinchen Wan, Kai Lv, Zilong Wang 0007, Qian Zhao, Feng Ning, Qingsong Ning, Shideng Zhang, Zhenyu Li 0001, Layong Luo, Gaogang Xie. 498-514 [doi]
- HetEC: Architectures for Heterogeneous Quantum Error Correction CodesSamuel Alexander Stein, Shifan Xu, Andrew W. Cross, Theodore J. Yoder, Ali Javadi-Abhari, Chenxu Liu, Kun Liu, Zeyuan Zhou, Charlie Guinn, Yufei Ding, Yongshan Ding 0001, Ang Li 0006. 515-528 [doi]
- Hierarchical Prefetching: A Software-Hardware Instruction Prefetcher for Server ApplicationsTingji Zhang, Boris Grot, Wenjian He, Yashuai Lv, Peng Qu, Fang Su, Wenxin Wang, Guowei Zhang, Xuefeng Zhang, Youhui Zhang. 529-544 [doi]
- HyperHammer: Breaking Free from KVM-Enforced IsolationWei Chen, Zhi Zhang 0001, Xin Zhang, Qingni Shen, Yuval Yarom, Daniel Genkin, Chen Yan, Zhe Wang 0017. 545-559 [doi]
- KernelGPT: Enhanced Kernel Fuzzing via Large Language ModelsChenyuan Yang, Zijie Zhao, Lingming Zhang. 560-573 [doi]
- Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch PipelineZhiyuan Fang, Yuegui Huang, Zicong Hong, Yufeng Lyu, Wuhui Chen, Yue Yu 0001, Fan Yu, Zibin Zheng. 574-588 [doi]
- Load and MLP-Aware Thread Orchestration for Recommendation Systems Inference on CPUsRishabh Jain, Teyuh Chou, Onur Kayiran, John Kalamatianos, Gabriel H. Loh, Mahmut T. Kandemir, Chita R. Das. 589-603 [doi]
- M5: Mastering Page Migration and Memory Management for CXL-based Tiered Memory SystemsYan Sun, Jongyul Kim 0001, Zeduo Yu, Jiyuan Zhang 0003, Siyuan Chai, Michael Jaemin Kim, Hwayong Nam, Jaehyun Park 0006, Eojin Na, Yifan Yuan, Ren Wang 0001, Jung Ho Ahn, Tianyin Xu, Nam Sung Kim. 604-621 [doi]
- MDPeek: Breaking Balanced Branches in SGX with Memory Disambiguation Unit Side ChannelsChang Liu, Shuaihu Feng, Yuan Li, Dongsheng Wang 0002, Wenjian He, Yongqiang Lyu 0001, Trevor E. Carlson. 622-638 [doi]
- Micro Blossom: Accelerated Minimum-Weight Perfect Matching Decoding for Quantum Error CorrectionYue Wu, Namitha Liyanage, Lin Zhong 0001. 639-654 [doi]
- MoC-System: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model TrainingWeilin Cai, Le Qin, Jiayi Huang. 655-671 [doi]
- iTex TessellationJianxing Xu, Yuanbo Wen, Zikang Liu, Ruibai Xu, Tingfeng Ruan, Jun Bi, Rui Zhang 0040, Di Huang, Xinkai Song, Yifan Hao, Xing Hu 0001, Zidong Du, Chongqing Zhao, Jiang Jie, Qi Guo 0001. 672-688 [doi]
- Necro-reaper: Pruning away Dead Memory Traffic in Warehouse-Scale ComputersSotiris Apostolakis, Chris Kennelly, Xinliang David Li, Parthasarathy Ranganathan. 689-703 [doi]
- OctoCache: Caching Voxels for Accelerating 3D Occupancy Mapping in Autonomous SystemsPeiqing Chen, Minghao Li, Zishen Wan, Yu-Shun Hsiao, Minlan Yu, Vijay Janapa Reddi, Zaoxing Liu. 704-718 [doi]
- Optimizing Deep Learning Inference Efficiency through Block Dependency AnalysisZhanyuan Di, Leping Wang, En Shao, Zhaojia Ma, ZiYi Ren, Feng Hua, Lixian Ma, Jie Zhao 0002, Guangming Tan, Ninghui Sun. 719-733 [doi]
- Orion: A Fully Homomorphic Encryption Framework for Deep LearningAustin Ebel, Karthik Garimella, Brandon Reagen. 734-749 [doi]
- OS2G: A High-Performance DPU Offloading Architecture for GPU-based Deep Learning with Object StorageZhen Jin 0008, Yiquan Chen, Mingxu Liang, Yijing Wang, Guoju Fang, Ao Zhou, Keyao Zhang, Jiexiong Xu, Wenhai Lin, Yiquan Lin, Shushu Zhao, Wenkai Shi, Zhenhua He, Shishun Cai, Wenzhi Chen. 750-765 [doi]
- PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing SystemYintao He, Haiyu Mao, Christina Giannoula, Mohammad Sadrosadati, Juan Gómez-Luna, Huawei Li 0001, Xiaowei Li 0001, Ying Wang 0001, Onur Mutlu. 766-782 [doi]
- Parendi: Thousand-Way Parallel RTL SimulationMahyar Emami, Thomas Bourgeat, James R. Larus. 783-797 [doi]
- Past-Future Scheduler for LLM Serving under SLA GuaranteesRuihao Gong, Shihao Bai, Siyu Wu, Yunqian Fan, Zaijun Wang, Xiuhong Li, Hailong Yang, Xianglong Liu 0001. 798-813 [doi]
- Pave: Information Flow Control for Privacy-preserving Online Data Processing ServicesMinkyung Park, Jaeseung Choi, Hyeonmin Lee, Ted Taekyoung Kwon. 814-830 [doi]
- PhasePrint: Exposing Cloud FPGA Fingerprints by Inducing Timing Faults at RuntimeJubayer Mahmod, Matthew Hicks. 831-844 [doi]
- PICACHU: Plug-In CGRA Handling Upcoming Nonlinear Operations in LLMsJiajun Qin, Tianhua Xia, Cheng Tan 0002, Jeff Zhang 0001, Sai Qian Zhang. 845-861 [doi]
- PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model InferenceYufeng Gu, Alireza Khadem, Sumanth Umesh, Ning Liang, Xavier Servot, Onur Mutlu, Ravi R. Iyer 0001, Reetuparna Das. 862-881 [doi]
- Pirate: No Compromise Low-Bandwidth VR Streaming for Edge DevicesYingtian Zhang, Yan Kang, Ziyu Ying 0001, Wanhang Lu, Sijie Lan, Huijuan Xu, Kiwan Maeng, Anand Sivasubramaniam, Mahmut T. Kandemir, Chita R. Das. 882-896 [doi]
- POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM InferenceAditya K. Kamath, Ramya Prabhu, Jayashree Mohan, Simon Peter 0001, Ramachandran Ramjee, Ashish Panwar. 897-912 [doi]
- Practical Federated Recommendation Model Learning Using ORAM with Controlled PrivacyJinyu Liu, Wenjie Xiong 0001, G. Edward Suh, Kiwan Maeng. 913-932 [doi]
- Protecting Cryptographic Code Against Spectre-RSB: (and, in Fact, All Known Spectre Variants)Santiago Arranz Olmos, Gilles Barthe, Chitchanok Chuengsatiansup, Benjamin Grégoire, Vincent Laporte, Tiago Oliveira 0004, Peter Schwabe, Yuval Yarom, Zhiyuan Zhang 0005. 933-948 [doi]
- Pruner: A Draft-then-Verify Exploration Mechanism to Accelerate Tensor Program TuningLiang Qiao, Jun Shi 0007, Xiaoyu Hao, Xi Fang, Sen Zhang, Minfan Zhao, Ziqi Zhu, Junshi Chen, Hong An, Xulong Tang, Bing Li, Honghui Yuan, Xinyang Wang. 949-965 [doi]
- Ratte: Fuzzing for Miscompilations in Multi-Level Compilers Using Composable SemanticsPingshi Yu, Nicolas Wu, Alastair F. Donaldson. 966-981 [doi]
- ReCA: Integrated Acceleration for Real-Time and Efficient Cooperative Embodied Autonomous AgentsZishen Wan, Yuhang Du, Mohamed Ibrahim, Jiayi Qian, Jason Jabbour, Yang (Katie) Zhao, Tushar Krishna, Arijit Raychowdhury, Vijay Janapa Reddi. 982-997 [doi]
- Relax: Composable Abstractions for End-to-End Dynamic Machine LearningRuihang Lai, Junru Shao, Siyuan Feng, Steven Lyubomirsky, Bohan Hou, Wuwei Lin, Zihao Ye 0001, Hongyi Jin, Yuchen Jin, Jiawei Liu 0004, LeSheng Jin, Yaxing Cai, Ziheng Jiang, Yong Wu, Sunghyun Park 0004, Prakalp Srivastava, Jared Roesch, Todd C. Mowry, Tianqi Chen 0001. 998-1013 [doi]
- Reload+Reload: Exploiting Cache and Memory Contention Side Channel on AMD SEVLi-Chung Chiang, Shih-wei Li. 1014-1027 [doi]
- RESCQ: Realtime Scheduling for Continuous Angle Quantum Error Correction ArchitecturesSayam Sethi, Jonathan Mark Baker. 1028-1043 [doi]
- Saving Energy with Per-Variable Bitwidth SpeculationTommy McMichen, David Dlott, Panitan Wongse-ammat, Nathan Greiner, Hussain Khajanchi, Russ Joseph, Simone Campanoni. 1044-1059 [doi]
- ShadowLoad: Injecting State into Hardware PrefetchersLorenz Hetterich, Fabian Thomas, Lukas Gerlach 0001, Ruiyi Zhang 0001, Nils Bernsdorf, Eduard Ebert, Michael Schwarz 0001. 1060-1075 [doi]
- Simplifying and Accelerating NOR Flash I/O Stack for RAM-Restricted MicrocontrollersHao Huang, Yanqi Pan, Wen Xia, Xiangyu Zou, Darong Yang, Liang Shi, Hongwei Du. 1076-1090 [doi]
- Skia: Exposing Shadow BranchesChrysanthos Pepi, Bhargav Reddy Godala, Krishnam Tibrewala, Gino A. Chacon, Paul V. Gratz, Daniel A. Jiménez, Gilles A. Pokam, David I. August. 1091-1106 [doi]
- SMaCk: Efficient Instruction Cache Attacks via Self-Modifying Code ConflictsSeonghun Son, Daniel Moghimi, Berk Gülmezoglu. 1107-1123 [doi]
- Snowplow: Effective Kernel Fuzzing with a Learned White-box Test MutatorSishuai Gong, Wang Rui, Deniz Altinbüken, Pedro Fonseca 0001, Petros Maniatis. 1124-1138 [doi]
- Spindle: Efficient Distributed Training of Multi-Task Large Models via Wavefront SchedulingYujie Wang, Shenhan Zhu, Fangcheng Fu, Xupeng Miao, Jie Zhang, Juan Zhu, Fan Hong, Yong Li, Bin Cui 0001. 1139-1155 [doi]
- Squeezing Operator Performance Potential for the Ascend ArchitectureYuhang Zhou, Zhibin Wang, Guyue Liu, Shipeng Li, Xi Lin, Zibo Wang, Yongzhong Wang, Fuchun Wei, Jingyi Zhang, Zhiheng Hu, Yanlin Liu, Chunsheng Li, Ziyang Zhang, Yaoyuan Wang, Bin Zhou, Wanchun Dou, Guihai Chen, Chen Tian 0001. 1156-1171 [doi]
- Stramash: A Fused-Kernel Operating System For Cache-Coherent, Heterogeneous-ISA PlatformsTong Xing, Cong Xiong, Tianrui Wei, April Sanchez, Binoy Ravindran, Jonathan Balkind, Antonio Barbalace. 1172-1188 [doi]
- StreamGrid: Streaming Point Cloud Analytics via Compulsory Splitting and Deterministic TerminationYu Feng 0007, Zheng Liu, Weikai Lin, Zihan Liu 0002, Jingwen Leng, Minyi Guo, Zhezhi He, Jieru Zhao, Yuhao Zhu 0001. 1189-1202 [doi]
- Systematic CXL Memory Characterization and Performance Analysis at ScaleJinshu Liu, Hamid Hadian, Yuyue Wang, Daniel S. Berger, Marie Nguyen, Xun Jian 0002, Sam H. Noh, Huaicheng Li. 1203-1217 [doi]
- Tackling ML-based Dynamic Mispredictions using Statically Computed Invariants for Attack Surface ReductionChris Porter, Sharjeel Khan, Kangqi Ni, Santosh Pande. 1218-1234 [doi]
- TaintEMU: Decoupling Tracking from Functional Domains for Architecture-Agnostic and Efficient Whole-System Taint TrackingLei Cui 0003, Youquan Xian, Peng Liu 0044, Longjin Lu. 1235-1250 [doi]
- TaOPT: Tool-Agnostic Optimization of Parallelized Automated Mobile UI TestingDezhi Ran, Zihe Song, Wenyu Wang, Wei Yang 0013, Tao Xie 0001. 1251-1265 [doi]
- TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud PlatformsJovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Esha Choukse, Haoran Qiu, Rodrigo Fonseca, Josep Torrellas, Ricardo Bianchini. 1266-1281 [doi]
- TNIC: A Trusted NIC Architecture: A hardware-network substrate for building high-performance trustworthy distributed systemsDimitra Giantsidi, Julian Pritzi, Felix Gust, Antonios Katsarakis, Atsushi Koshiba, Pramod Bhatotia. 1282-1301 [doi]
- Towards End-to-End Optimization of LLM-based Applications with AyoXin Tan, Yimin Jiang, Yitao Yang, Hong Xu. 1302-1316 [doi]
- Towards Sound Reassembly of Modern x86-64 BinariesHyungseok Kim 0002, Soomin Kim 0002, Sang Kil Cha. 1317-1333 [doi]
- Treelet Accelerated Ray Tracing on GPUsYuan-Hsi Chou, Tor M. Aamodt. 1334-1347 [doi]
- Vela: A Virtualized LLM Training System with GPU Direct RoCEApoorve Mohan, Robert Walkup, Bengi Karacali, Ming-Hung Chen, Abdullah Kayi, Liran Schour, Shweta Salaria, Sophia Wen, I-Hsin Chung, Abdul Alim, Constantinos Evangelinos, Lixiang Luo, Marc Dombrowa, Laurent Schares, Ali Sydney, Pavlos Maniotis, Sandhya Koteshwara, Brent Tang, Joel Belog, Rei Odaira, Vasily Tarasov, Eran Gampel, Drew Thorstensen, Talia Gershon, Seetharami Seelam. 1348-1364 [doi]
- Velosiraptor: Code Synthesis for Memory TranslationReto Achermann, Em Chu, Ryan Mehri, Ilias Karimalis, Margo I. Seltzer. 1365-1381 [doi]
- Virgo: Cluster-level Matrix Unit Integration in GPUs for Scalability and Energy EfficiencyHansung Kim, Ruohan Richard Yan, Joshua You, Tieliang Vamber Yang, Yakun Sophia Shao. 1382-1399 [doi]
- Virtuoso: Enabling Fast and Accurate Virtual Memory Research via an Imitation-based Operating System Simulation MethodologyKonstantinos Kanellopoulos, Konstantinos Sgouras, F. Nisa Bostanci, Andreas Kosmas Kakolyris, Berkin Kerim Konar, Rahul Bera, Mohammad Sadrosadati, Rakesh Kumar 0003, Nandita Vijaykumar, Onur Mutlu. 1400-1421 [doi]