Abstract is missing.
- Mission-Critical Enterprise Systems... What's a Mission? And What's Critical?: Processor and technology requirements for enterprise computingHillery Hunter. 1 [doi]
- Unicorns, Centaurs, and Cyborgs: Co-design Powering the Intelligence EraParthasarathy Ranganathan. 2 [doi]
- Title: An AI Stack: From Scaling AI Workloads to Evaluating LLMsIon Stoica. 3-4 [doi]
- A Cost-Effective Near-Storage Processing Solution for Offline Inference of Long-Context LLMsHongsun Jang, Jaeyong Song 0002, Changmin Shin 0002, Si Ung Noh, Jaewon Jung 0001, Jisung Park 0001, Jinho Lee 0001. 5-24 [doi]
- A Framework for Developing and Optimizing Fully Homomorphic Encryption Programs on GPUsJianyu Zhao 0004, Xueyu Wu 0001, Guang Fan, Mingzhe Zhang 0005, Shoumeng Yan, Lei Ju 0001, Zhuoran Ji. 25-40 [doi]
- A Programming Model for Disaggregated Memory over CXLGal Assa, Moritz Lumme, Lucas Bürgi, Michal Friedman 0001, Ori Lahav 0001. 41-58 [doi]
- Accelerating Computation in Quantum LDPC CodeJungmin Cho, Hyeonseong Jeong, Junpyo Kim, Junhyuk Choi, Juwon Hong, Jangwoo Kim. 59-76 [doi]
- AlphaSyndrome: Tackling the Syndrome Measurement Circuit Scheduling Problem for QEC CodesYuhao Liu 0017, Shuohao Ping, Junyu Zhou 0005, Ethan Decker, Justin Kalloor, Mathias Weiden, Kean Chen, Yunong Shi, Ali Javadi-Abhari, Costin Iancu, Gushu Li. 77-93 [doi]
- An MLIR Lowering Pipeline for Stencils at Wafer-ScaleNicolai Stawinoga, David Katz, Anton Lydike, Justs Zarins, Nick Brown 0002, George Bisbas, Tobias Grosser. 94-109 [doi]
- Anvil: A General-Purpose Timing-Safe Hardware Description LanguageJason Zhijingcheng Yu, Aditya Ranjan Jha, Umang Mathur 0001, Trevor E. Carlson, Prateek Saxena. 110-136 [doi]
- APT: Securing Against DRAM Read Disturbance via Adaptive Probabilistic In-DRAM TrackersRunjin Wu, Meng Zhang 0014, You Zhou 0009, Changsheng Xie 0001, Fei Wu 0005. 137-156 [doi]
- Arancini: A Hybrid Binary Translator for Weak Memory Model ArchitecturesSebastian Reimers, Dennis Sprokholt, Martin Fink 0004, Theofilos Augoustis, Simon Kammermeier, Rodrigo C. O. Rocha, Tom Spink, Redha Gouicem, Soham Chakraborty 0001, Pramod Bhatotia. 157-174 [doi]
- Architecting Scalable Trapped Ion Quantum Computers using Surface CodesScott Jones, Prakash Murali. 175-190 [doi]
- Arm Weak Memory Consistency on Apple Silicon: What Is It Good For?Yossi Khayet, Adam Morrison 0001. 191-207 [doi]
- Asynchrony and GPUs: Bridging this Dichotomy for I/O with AGIOJihoon Han 0001, Anand Sivasubramaniam, Chia-Hao Chang, Vikram Sharma Mailthody, Zaid Qureshi, Wen-mei Hwu. 208-222 [doi]
- Bat: Efficient Generative Recommender Serving with Bipartite AttentionJie Sun 0017, Shaohang Wang, Zimo Zhang, Zhengyu Liu, Yunlong Xu, Peng Sun 0006, Bo Zhao, Bingsheng He, Fei Wu 0001, Zeke Wang. 223-238 [doi]
- BitRed: Taming Non-Uniform Bit-Level Sparsity with a Programmable RISC-V ISA for DNN AccelerationYanhuan Liu, Wenming Li, Kunming Zhang, Yuqun Liu, Siao Wen, Lexin Wang, Tianyu Liu 0007, Haibin Wu, Zhihua Fan, Xiaochun Ye, Dongrui Fan, Xuejun An. 239-254 [doi]
- BlendServe: Optimizing Offline Inference with Resource-Aware BatchingYilong Zhao 0002, Shuo Yang 0011, Kan Zhu, Lianmin Zheng, Baris Kasikci, Yifan Qiao 0002, Yang Zhou 0008, Jiarong Xing, Ion Stoica. 255-273 [doi]
- Borrowing Dirty Qubits in Quantum ProgramsBonan Su, Li Zhou 0013, Yuan Feng 0001, Mingsheng Ying. 274-289 [doi]
- Bullet: Boosting GPU Utilization for LLM Serving via Dynamic Spatial-Temporal OrchestrationZejia Lin 0001, Hongxin Xu, Guanyi Chen, Zhiguang Chen 0001, Yutong Lu, Xianwei Zhang 0001. 290-306 [doi]
- CacheMind: From Miss Rates to Why - Natural-Language, Trace-Grounded Reasoning for Cache ReplacementKaushal Mhapsekar, Azam Ghanbari, Bita Aslrousta, Samira Mirbagher Ajorpaz. 307-322 [doi]
- CEMU: Enabling Full-System Emulation of Computational Storage Beyond Hardware LimitsQiuyang Zhang, Jiapin Wang, You Zhou 0009, Peng Xu, Kai Lu 0002, Jiguang Wan 0001, Fei Wu 0005, Tao Lu 0014. 323-341 [doi]
- CHEHAB RL: Learning to Optimize Fully Homomorphic Encryption ComputationsBilel Sefsaf, Abderraouf Dandani, Abdessamed Seddiki, Arab Mohammed, Eduardo Chielle, Michail Maniatakos, Riyadh Baghdadi. 342-360 [doi]
- Chips Need DIP: Time-Proportional Per-Instruction Cycle Stacks at DispatchSilvio Heverton Campelo de Santana, Joseph Rogers, Lieven Eeckhout, Magnus Jahre. 361-376 [doi]
- CLM: Removing the GPU Memory Barrier for 3D Gaussian SplattingHexu Zhao, Xiwen Min, Xiaoteng Liu, Moonjun Gong, Yiming Li 0003, Ang Li 0006, Saining Xie, Jinyang Li 0001, Aurojit Panda. 377-393 [doi]
- Co-Exploration of RISC-V Processor Microarchitectures and FreeRTOS Extensions for Lower Context-Switch LatencyMarkus Scheck, Tammo Mürmann, Andreas Koch 0001. 394-408 [doi]
- CoGraf: Fully Accelerating Graph Applications with Fine-Grained PIMAli Semi Yenimol, Anirban Nag, Chang Hyun Park 0001, David Black-Schaffer. 409-425 [doi]
- COMPAS: A Distributed Multi-Party SWAP Test for Parallel Quantum AlgorithmsBrayden Goldstein-Gelb, Kun Liu, John M. Martyn, Hengyun (Harry) Zhou, Yongshan Ding 0001, Yuan Liu. 426-441 [doi]
- Compass: Navigating the Design Space of Taint Schemes for RTL Security VerificationYuheng Yang, Qinhan Tan, Thomas Bourgeat, Sharad Malik, Mengjia Yan 0001. 442-458 [doi]
- CounterPoint: Using Hardware Event Counters to Refute and Refine Microarchitectural AssumptionsNick Lindsay, Caroline Trippel, Anurag Khandelwal, Abhishek Bhattacharjee. 459-475 [doi]
- CPU-Oblivious Offloading of Failure-Atomic Transactions for Disaggregated MemoryCheng Chen, Chencheng Ye 0001, Yuanchao Xu 0001, Xipeng Shen, Xiaofei Liao, Hai Jin 0001, Wenbin Jiang 0001, Yan Solihin. 476-492 [doi]
- CREATE: Cross-Layer Resilience Characterization and Optimization for Efficient yet Reliable Embodied AI SystemsTong Xie, Yijiahao Qi, Jinqi Wen, Zishen Wan, Yanchi Dong, Zihao Wang, Shaofei Cai, Yitao Liang, Tianyu Jia, Yuan Wang 0001, Runsheng Wang, Meng Li 0004. 493-510 [doi]
- CREST: High-Performance Contention Resolution for Disaggregated TransactionsQihan Kang, Mi Zhang 0007, Patrick P. C. Lee, Yongkang Hu. 511-527 [doi]
- Cxlalloc: Safe and Efficient Memory Allocation for a CXL PodNewton Ni, Yan Sun, Zhiting Zhu, Emmett Witchel. 528-545 [doi]
- CXLMC: Model Checking CXL Shared Memory ProgramsSimon Guo 0005, Conan Truong, Brian Demsky. 546-562 [doi]
- DARTH-PUM: A Hybrid Processing-Using-Memory ArchitectureRyan Wong 0001, Ben Feinberg, Saugata Ghose. 563-582 [doi]
- Detecting Inconsistencies in Arm CCA's Formally Verified SpecificationChangho Choi, Xiang Cheng, Bokdeuk Jeong, Taesoo Kim. 583-601 [doi]
- DFVG: A Heterogeneous Architecture for Speculative Decoding with Draft-on-FPGA and Verify-on-GPUShaoqiang Lu, Yangbo Wei, Junhong Qian, Dongge Qin, Shiji Gao, Yizhi Ding, Qifan Wang, Chen Wu, Xiao Shi 0001, Lei He 0001. 602-617 [doi]
- DIP: Efficient Large Multimodal Model Training with Dynamic Interleaved PipelineZhenliang Xue, Hanpeng Hu, Xing Chen 0009, Yimin Jiang, Yixin Song, Zeyu Mi, Yibo Zhu 0001, Daxin Jiang, Yubin Xia, Haibo Chen 0001. 618-632 [doi]
- EARTH: An Efficient MoE Accelerator with Entropy-Aware Speculative Prefetch and Result ReuseFangxin Liu, Ning Yang, Jingkui Yang, Zongwu Wang, Chenyang Guan, Yu Feng 0007, Li Jiang 0002, Haibing Guan. 633-646 [doi]
- Efficient Remote Memory Ordering for Non-Coherent SystemsWei Siew Liew, Md Ashfaqur Rahaman, Adarsh Patil 0002, Ryan Stutsman, Vijay Nagarajan. 647-661 [doi]
- Efficient Temporal Graph Network Training via Unified Redundancy EliminationYiqing Wang, Hailong Yang 0002, Kejie Ma, Enze Yu, Pengbo Wang, Xin You 0001, Qingxiao Sun, Chenhao Xie 0001, Zhongzhi Luan, Yi Liu 0013, Depei Qian 0002. 662-678 [doi]
- Enabling Fast Networking in the Public CloudAlireza Sanaee, Vahab Jabrayilov, Ilias Marinos, Farbod Shahinfar, Divyanshu Saxena, Gianni Antichi, Kostis Kaffes. 679-696 [doi]
- Evaluating Compiler Optimization Impacts on zkVM PerformanceThomas Gassmann, Stefanos Chaliasos, Thodoris Sotiropoulos, Zhendong Su 0001. 697-714 [doi]
- Falcon: Algorithm-Hardware Co-Design for Efficient Fully Homomorphic Encryption AcceleratorLiang Kong 0005, Xianglong Deng, Guang Fan, Shengyu Fan, Lei Chen, Yilan Zhu, Geng Yang 0001, Yisong Chang, Shoumeng Yan, Mingzhe Zhang 0005. 715-731 [doi]
- FastTTS: Accelerating Test-Time Scaling for Edge LLM ReasoningHao Mark Chen, Zhiwen Mo, Guanxi Lu, Shuang Liang 0012, Lingxiao Ma, Wayne Luk, Hongxiang Fan. 732-748 [doi]
- Finding Reusable Instructions via E-Graph Anti-UnificationYouwei Xiao, Chenyun Yin, Yitian Sun, Yuyang Zou, Yun Liang 0001. 749-763 [doi]
- Fine-grained and Non-intrusive LLM Training Monitoring via Microsecond-level Traffic MeasurementYibo Xiao, Hao Zheng, Haifeng Sun 0004, Qingkai Meng 0001, Jiong Duan, Xiaohe Hu, Rong Gu 0001, Guihai Chen, Chen Tian 0001. 764-782 [doi]
- FlashMem: Supporting Modern DNN Workloads on Mobile with GPU Memory Hierarchy OptimizationsZhihao Shu, Md. Musfiqur Rahman Sanim, Hangyu Zheng, Kunxiong Zhu, Miao Yin, Gagan Agrawal, Wei Niu 0002. 783-797 [doi]
- FuseFlow: A Fusion-Centric Compilation Framework for Sparse Deep Learning on Streaming DataflowRubens Lacouture, Nathan Zhang, Ritvik Sharma, Marco Siracusa, Fredrik Kjolstad, Kunle Olukotun, Olivia Hsu. 798-820 [doi]
- Graphiti: Formally Verified Out-of-Order Execution in Dataflow CircuitsYann Herklotz, Ayatallah Elakhras, Martina Camaioni, Paolo Ienne, Lana Josipovic, Thomas Bourgeat. 821-837 [doi]
- gShare: Efficient GPU Sharing with Aggressive Scheduling in Multi-tenant FaaS platformYanan Yang, Zhengxiong Jiang, Meiqi Zhu, Hongqiang Xu, Yujun Wang, Liang Li 0016, Jiansong Zhang, Jie Wu 0001. 838-859 [doi]
- GS-Scale: Unlocking Large-Scale 3D Gaussian Splatting Training via Host OffloadingDonghyun Lee 0005, Dawoon Jeong, Jae W. Lee, Hongil Yoon. 860-875 [doi]
- Hardwired-Neuron Language Processing Units as General-Purpose Cognitive SubstratesYang Liu, Yi Chen, Yongwei Zhao 0001, Yifan Hao 0001, Zifu Zheng, Weihao Kong, Zhangmai Li, Dongchen Jiang, Ruiyang Xia, Zhihong Ma, Zisheng Liu, Zhaoyong Wan, Yunqi Lu, Ximing Liu, Hongrui Guo, Zhihao Yang, Zhe Wang 0017, Tianrui Ma, Mo Zou, Rui Zhang 0040, Ling Li 0001, Xing Hu 0001, Zidong Du, Zhiwei Xu 0002, Qi Guo 0001, Tianshi Chen 0002, Yunji Chen. 876-895 [doi]
- HEPIC: Private Inference over Homomorphic Encryption with Client InterventionKevin Nam, Youyeon Joo, Seungjin Ha, Hyungon Moon, Yunheung Paek. 896-911 [doi]
- Highly Automated Verification of Security Properties for Unmodified System SoftwareGanxiang Yang, Wei Qiang, Yi Rong, Xuheng Li, Fanqi Yu, Jason Nieh, Ronghui Gu. 912-928 [doi]
- History Doesn't Repeat Itself but Rollouts Rhyme: Accelerating Reinforcement Learning with RhymeRLJingkai He, Tianjian Li, Erhu Feng, Dong Du 0003, Qian Liu 0033, Tao Liu, Yubin Xia, Haibo Chen 0001. 929-945 [doi]
- Hitchhike: Efficient Request Submission via Deferred Enforcement of Address ContiguityXuda Zheng, Jian Zhou 0004, Shuhan Bai, Runjin Wu, Xianlin Tang, Zhiyuan Li, Hong Jiang 0001, Fei Wu 0005. 946-961 [doi]
- I/O Analysis is All You Need: An I/O Analysis for Long-Sequence AttentionXiaoyang Lu, Boyu Long, Xiaoming Chen 0003, Yinhe Han 0001, Xian-He Sun. 962-977 [doi]
- ICARUS: Criticality and Reuse based Instruction Caching for Datacenter ApplicationsVedant Kalbande, Hrishikesh Jedhe Deshmukh, Alberto Ros 0001, Biswabandan Panda. 978-992 [doi]
- Insum: Sparse GPU Kernels Simplified and Optimized with Indirect EinsumsJaeyeon Won, Willow Ahrens, Saman P. Amarasinghe, Joel S. Emer. 993-1006 [doi]
- iSwitch: QEC on Demand via In-Situ Encoding of Bare Qubits for Ion Trap ArchitecturesKeyi Yin, Xiang Fang, Zhuo Chen, David Hayes, Eneet Kaur, Reza Nejabati, Hartmut Haeffner, Wes Campbell, Eric R. Hudson, Jens Palsberg, Travis S. Humble, Yufei Ding 0001. 1007-1021 [doi]
- It Takes Two to EntangleZhanghan Wang, Ding Ding, Hang Zhu, Haibin Lin, Aurojit Panda. 1022-1039 [doi]
- JOSer: Just-In-Time Object Serialization for Heavy Java Serialization WorkloadsChaokun Yang, Pengbo Nie, Ziyi Lin, Weipeng Wang, Qianwei Yu, Chengcheng Wan 0001, He Jiang 0001, Yuting Chen 0001. 1040-1054 [doi]
- LAER-MoE: Load-Adaptive Expert Re-layout for Efficient Mixture-of-Experts TrainingXinyi Liu, Yujie Wang, Fangcheng Fu, Xuefeng Xiao 0001, Huixia Li, Jiashi Li, Bin Cui 0001. 1055-1072 [doi]
- LAIKA: Machine Learning-Assisted In-Kernel APU AccelerationHaoming Zhuo, Dingding Li, Ronghua Lin, Yong Tang 0001. 1073-1088 [doi]
- Lifetime-Aware Design for Item-Level Intelligence at the Extreme EdgeShvetank Prakash, Andrew Cheng, Olof Kindgren, Ashiq Ahamed, Graham Knight, Jedrzej Kufel, Francisco Rodriguez, Arya Tschand, David Kong 0001, Mariam Elgamal, Jerry Huang, Emma Chen, Gage Hills, Richard Price, Emre Ozer 0001, Vijay Janapa Reddi. 1089-1112 [doi]
- LOOPRAG: Enhancing Loop Transformation Optimization with Retrieval-Augmented Large Language ModelsYijie Zhi, Yayu Cao, Jianhua Dai, Xiaoyang Han, Jingwen Pu, Qinran Wu, Sheng Cheng, Ming Cai. 1113-1135 [doi]
- LPO: Discovering Missed Peephole Optimizations with Large Language ModelsZhenYang Xu, Hongxu Xu, Yongqiang Tian 0001, Xintong Zhou, Chengnian Sun. 1136-1150 [doi]
- 2XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit QuantizationWeiming Hu 0005, Zihan Zhang, Haoyan Zhang, Chen Zhang 0001, Cong Guo 0003, Yu Feng 0007, Tianchi Hu, Guanglin Li 0005, Guipeng Hu, Junsong Wang, Jingwen Leng. 1151-1167 [doi]
- Maverick: Rethinking TFHE Bootstrapping on GPUs via Algorithm-Hardware Co-DesignZhiwei Wang, Haoqi He, Lutan Zhao, Qingyun Niu, Dan Meng 0002, Rui Hou 0001. 1168-1184 [doi]
- MoE-APEX: An Efficient MoE Inference System with Adaptive Precision Expert OffloadingPeng Tang, Jiacheng Liu 0001, Xiaofeng Hou, Yifei Pu, Jing Wang 0055, Pheng-Ann Heng, Chao Li 0009, Minyi Guo. 1185-1200 [doi]
- MSCCL++: Rethinking GPU Communication Abstractions for AI InferenceChangho Hwang, Peng Cheng 0005, Roshan Dathathri, Abhinav Jangda, Saeed Maleki, Madan Musuvathi, Olli Saarikivi, Aashaka Shah, Ziyue Yang, Binyang Li, Caio Rocha, Qinghua Zhou, Mahdieh Ghazimirsaeed, Sreevatsa Anantharamu, Jithin Jose. 1201-1215 [doi]
- Mugi: Value Level Parallelism For Efficient LLMsDaniel Price, Prabhu Vellaisamy, John Paul Shen, Di Wu 0016. 1216-1234 [doi]
- Nebula: Infinite-Scale 3D Gaussian Splatting in VR via Collaborative Rendering and Accelerated Stereo RasterizationHe Zhu, Zheng Liu, Xingyang Li, Anbang Wu, Jieru Zhao, Fangxin Liu, Yiming Gan, Jingwen Leng, Yu Feng 0007. 1235-1250 [doi]
- Nemo: A Low-Write-Amplification Cache for Tiny Objects on Log-Structured Flash DevicesXufeng Yang, Tingting Tan, Jingxin Hu, Congming Gao, Mingyang Liu, Tianyang Jiang, Jian Chen, Linbo Long, Yina Lv, Jiwu Shu. 1251-1267 [doi]
- Neo: Real-Time On-Device 3D Gaussian Splatting with Reuse-and-Update Sorting AccelerationChanghun Oh, Seongryong Oh, Jinwoo Hwang, Yoonsung Kim, Hardik Sharma, Jongse Park. 1268-1284 [doi]
- Neura: A Unified Framework for Hierarchical and Adaptive CGRAsCheng Tan 0002, Miaomiao Jiang, Yuqi Sun, Ruihong Yin, Yanghui Ou, Qing Zhong, Lei Ju 0001, Jeff Zhang 0001. 1285-1300 [doi]
- oFFN: Outlier and Neuron-aware Structured FFN for Fast yet Accurate LLM InferenceGeunsoo Song, Hoeseok Yang, Youngmin Yi. 1301-1315 [doi]
- Once4All: Skeleton-Guided SMT Solver Fuzzing with LLM-Synthesized GeneratorsMaolin Sun, Yibiao Yang, Yuming Zhou. 1316-1332 [doi]
- Optimizer-Friendly Instrumentation for Event Quantification with PRUE AlgorithmHao Ling, Yiyuan Guo, Charles Zhang 0001. 1333-1348 [doi]
- Ouroboros: Wafer-Scale SRAM CIM with Token-Grained Pipelining for Large Language Model InferenceYiqi Liu, Yudong Pan, Mengdi Wang 0004, Shixin Zhao, Haonan Zhu, Yinhe Han 0001, Lei Zhang 0008, Ying Wang 0001. 1349-1365 [doi]
- PACT: A Criticality-First Design for Tiered MemoryHamid Hadian, Jinshu Liu, Hanchen Xu, Hansen Idden, Huaicheng Li. 1366-1381 [doi]
- Parameterized Hardware Design with Latency-Abstract InterfacesRachit Nigam, Ethan Gabizon, Edmund Lam, Carolyn Zech, Jonathan Balkind, Adrian Sampson. 1382-1395 [doi]
- PAT: Accelerating LLM Decoding via Prefix-Aware Attention with Resource Efficient Multi-Tile KernelJinjun Yi, Zhixin Zhao, Yitao Hu, Ke Yan, Weiwei Sun, Hao Wang, Laiping Zhao, Yuhao Zhang, Wenxin Li 0001, Keqiu Li. 1396-1412 [doi]
- Performance Predictability in Heterogeneous MemoryJinshu Liu, Hanchen Xu, Daniel S. Berger, Marcos K. Aguilera, Huaicheng Li. 1413-1429 [doi]
- PF-LLM: Large Language Model Hinted Hardware PrefetchingCeyu Xu, Xiangfeng Sun, Weihang Li, Chen Bai, Bangyan Wang, Mengming Li, Zhiyao Xie, Yuan Xie 0001. 1430-1444 [doi]
- PIPM: Partial and Incremental Page Migration for Multi-host CXL Disaggregated Shared MemoryGangqi Huang, Heiner Litz, Yuanchao Xu 0001. 1445-1460 [doi]
- PrioriFI: More Informed Fault Injection for Edge Neural NetworksOlivia Weng, Andres Meza 0001, Nhan Tran, Ryan Kastner. 1461-1475 [doi]
- PropHunt: Automated Optimization of Quantum Syndrome Measurement CircuitsJoshua Viszlai, Satvik Maurya, Swamit Tannu, Margaret Martonosi, Frederic T. Chong. 1476-1491 [doi]
- QoServe: Breaking the Silos of LLM Inference ServingKanishk Goel, Jayashree Mohan, Nipun Kwatra, Ravi Shreyas Anupindi, Ramachandran Ramjee. 1492-1507 [doi]
- Rage Against the State Machine: Type-Stated Hardware Peripherals for Increased Driver CorrectnessTyler Potyondy, Anthony Tarbinian, Leon Schuermann, Eric Mugnier, Adin Ackerman, Amit Levy 0001, Pat Pannuto. 1508-1522 [doi]
- Reconfigurable Quantum Instruction Set Computers for High Performance Attainable on HardwareZhaohui Yang, Dawei Ding 0002, Qi Ye 0005, Cupjin Huang, Jianxin Chen, Yuan Xie. 1523-1546 [doi]
- Reconfigurable Torus Fabrics for Multi-tenant MLAbhishek Vijaya Kumar, Eric Ding 0002, Arjun Devraj, Darius Bunandar, Rachee Singh. 1547-1565 [doi]
- RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI AcceleratorsXinsheng Tang, Yangcheng Li, Nan Wang, Zhiyi Shu, Xingyu Ling, Junna Xing, Peng Zhou, Qiang Liu. 1566-1588 [doi]
- Reducing T Gates with Unitary SynthesisTianyi Hao 0003, Amanda Xu, Swamit Tannu. 1589-1604 [doi]
- ReliaFHE: Resilient Design for Fully Homomorphic Encryption AcceleratorsFan Li, Mayank Kumar, Ruizhi Zhu, Mengxin Zheng, Qian Lou, Xin Xin 0008. 1605-1621 [doi]
- REPA: Reconfigurable PIM for the Joint Acceleration of KV Cache Offloading and ProcessingYang Hong, Junlong Yang, Bo Peng 0043, Jianguo Yao 0002. 1622-1639 [doi]
- RowArmor: Efficient and Comprehensive Protection Against DRAM Disturbance AttacksMinbok Wi, Yoonyul Yoo, YooJin Kim, Jaeho Shin, Jumin Kim, Yesin Ryu, Saeid Gorgin 0001, Jung Ho Ahn, Jungrae Kim. 1640-1659 [doi]
- RTeAAL Sim: Using Tensor Algebra to Represent and Accelerate RTL SimulationYan Zhu, Boru Chen, Christopher W. Fletcher, Nandeeka Nayak. 1660-1676 [doi]
- Scaling Automated Database System TestingSuyang Zhong, Manuel Rigger. 1677-1692 [doi]
- Segment Only Where You Look: Leveraging Human Gaze Behavior for Efficient Computer Vision Applications in Augmented RealityTianhua Xia, Haiyu Wang, Sai Qian Zhang. 1693-1710 [doi]
- SEVI: Silent Data Corruption of Vector Instructions in Hyper-Scale DatacentersYixuan Mei, Shreya Varshini, Harish Dattatraya Dixit, Sriram Sankar, K. V. Rashmi. 1711-1726 [doi]
- SG-IOV: Socket-Granular I/O Virtualization for SmartNIC-Based Container NetworksChenxingyu Zhao, Hongtao Zhang, Jaehong Min, Shengkai Lin, Wei Zhang 0052, Kaiyuan Zhang 0001, Ming Liu 0027, Arvind Krishnamurthy. 1727-1748 [doi]
- Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic WorkloadsMert Hidayetoglu, Aurick Qiao, Michael Wyatt, Jeff Rasley, Yuxiong He, Samyam Rajbhandari. 1749-1763 [doi]
- Signal Breaker: Fuzzing Digital Signal ProcessorsCameron Santiago Garcia, Matthew Hicks. 1764-1779 [doi]
- Skyler: Static Analysis for Predicting API-Driven Costs in Serverless ApplicationsBernardo Ribeiro, Mafalda Ferreira, José Fragoso Santos, Rodrigo Bruno, Nuno Santos 0001. 1780-1799 [doi]
- SLAWS: Spatial Locality Analysis and Workload Orchestration for Sparse Matrix MultiplicationGuoyu Li, Zheng Guan, Beichen Zhang, Jun Yu, Kun Wang. 1800-1814 [doi]
- SNIP: An Adaptive Mixed Precision Framework for Subbyte Large Language Model TrainingYunjie Pan, Yongyi Yang, Hanmei Yang, Scott Mahlke. 1815-1831 [doi]
- SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMsJiaming Xu, Jiayi Pan, Hanzhen Wang, Yongkang Zhou, Jiancai Ye, Yu Wang 0002, Guohao Dai 0001. 1832-1847 [doi]
- SpecProto: A Parallelizing Compiler for Speculative Decoding of Large Protocol Buffers DataZhijie Wang, Chales Hong, Dhruv Parmar, Shengbo Ma, Zhijia Zhao 0001, Qidong Zhao, Xu Liu 0001. 1848-1862 [doi]
- STARC: Selective Token Access with Remapping and Clustering for Efficient LLM Decoding on PIM SystemsZehao Fan, Yunzhen Liu, Garrett Gagnon, Zhenyu Liu, Yayue Hou, Hadjer Benmeziane, Kaoutar El Maghraoui, Liu Liu 0017. 1863-1879 [doi]
- Static Analysis for Efficient Streaming TokenizationAngela W. Li, Yudi Yang, Konstantinos Mamouras. 1880-1896 [doi]
- STRAW: Stress-Aware WL-Based Read Disturbance Management for High-Density NAND Flash MemoryMyoungjun Chun, Jaeyong Lee, Inhyuk Choi, Jisung Park 0001, Myungsuk Kim, Jihong Kim 0001. 1897-1911 [doi]
- Streaming Tensor Programs: A Streaming Abstraction for Dynamic ParallelismGina Sohn, Genghan Zhang, Konstantin Hoßfeld, Jungwoo Kim, Nathan Sobotka, Nathan Zhang, Olivia Hsu, Kunle Olukotun. 1912-1932 [doi]
- Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive DrafterQinghao Hu 0004, Shang Yang, Junxian Guo, Xiaozhe Yao, Yujun Lin 0001, Yuxian Gu, Han Cai, Chuang Gan 0001, Ana Klimovic, Song Han 0001. 1933-1948 [doi]
- T-Control: An Efficient Dynamic Tensor Rematerialization System for DNN TrainingZehua Wang, Junmin Xiao, Xiaochuan Deng, Huibing Wang, Hui Ma, Mingyi Li, Yunfei Pang, Guangming Tan. 1949-1965 [doi]
- TEEM³: Core-Independent and Cooperating Trusted Execution EnvironmentsNils Asmussen, Sebastian Haas, Carsten Weinhold, Nicholas Gordon, Stephan Gerhold, Friedrich Pauls, Nilanjana Das, Michael Roitzsch. 1966-1981 [doi]
- TetriServe: Efficiently Serving Mixed DiT WorkloadsRunyu Lu, Shiqi He, Wenxuan Tan, Shenggui Li, Ruofan Wu, Jeff J. Ma, Ang Chen 0001, Mosharaf Chowdhury. 1982-1997 [doi]
- TierX: A Simulation Framework for Multi-tier BCI System Design Evaluation and ExplorationSeunghyun Song, Yeongwoo Jang, Daye Jung, Kyungsoo Park, Donghan Kim, Gwangjin Kim, Hunjun Lee, Jerald Yoo, Jangwoo Kim. 1998-2014 [doi]
- Toasty: Speeding Up Network I/O with Cache-Warm BuffersPreeti, Nitish Bhat, Ashwin Kumar, Mythili Vutukuru. 2015-2029 [doi]
- Towards High-Goodput LLM Serving with Prefill-decode MultiplexingYukang Chen, Weihao Cui, Han Zhao 0005, Ziyi Xu, Xiaoze Fan, Xusheng Chen, Yangjie Zhou 0001, Shixuan Sun, Bingsheng He, Quan Chen 0002. 2030-2047 [doi]
- TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill & Decode InferenceXiaojuan Tang, Fanxu Meng 0003, Pingzhi Tang, Yuxuan Wang 0012, Di Yin, Xing Sun 0001, Muhan Zhang. 2048-2062 [doi]
- TreeVQA: A Tree-Structured Execution Framework for Shot Reduction in Variational Quantum AlgorithmsYuewen Hou, Dhanvi Bharadwaj, Gokul Subramanian Ravi. 2063-2078 [doi]
- Trinity: Three-Dimensional Tensor Program Optimization via Tile-level Equality SaturationJaehyeong Park, Youngchan Kim, Haechan An, Gieun Jeong, Jeehoon Kang, Dongsu Han. 2079-2107 [doi]
- Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic ContextHao Wu 0077, Qidong Zhao, Songqing Chen, Yang Chen, Yueming Hao, Tony CW Liu, Sijia Chen, Adnan Aziz, Keren Zhou 0001. 2108-2124 [doi]
- Trust-V: Toward Secure and Reliable Storage for Trusted Execution EnvironmentsSeung-Kyun Han, Jiyeon Yang, Jinsoo Jang. 2125-2140 [doi]
- Understanding and Optimizing Database Pushdown on Disaggregated StorageHua Zhang, Xiao Li, Yuebin Bai, Ming Liu. 2141-2158 [doi]
- Understanding Query Optimization Bugs in Graph Database SystemsYuyu Chen, Zhongxing Yu. 2159-2176 [doi]
- vCXLGen: Automated Synthesis and Verification of CXL Bridges for Heterogeneous ArchitecturesAnatole Lefort, Julian Pritzi, Nicolò Carpentieri, David Schall, Simon Dittrich, Soham Chakraborty 0001, Nicolai Oswald, Pramod Bhatotia. 2177-2196 [doi]
- SwiftSpec: Disaggregated Speculative Decoding and Fused Kernels for Low-Latency LLM InferenceZiyi Zhang, Ziheng Jiang, Chengquan Jiang, Menghan Yu, Size Zheng 0001, Haibin Lin, Xin Liu 0086, Henry Hoffmann. 2197-2211 [doi]
- Wave: Leveraging Architecture Observation for Privacy-Preserving Model OversightHaoxuan Xu, Chen Gong, Beijie Liu, Haizhong Zheng, Beidi Chen, Mengyuan Li 0004. 2212-2231 [doi]
- Wax: Optimizing Data Center Applications With Stale ProfileTawhid Bhuiyan, Sumya Hoque, Angelica Aparecida Moreira, Tanvir Ahmed Khan. 2232-2248 [doi]
- WorksetEnclave: Towards Optimizing Cold Starts in Confidential Serverless with Workset-Based Enclave RestoreXiaolong Yan, Qihang Zhou, Zisen Wan, Feifan Qian, Wentao Yao, Weijuan Zhang, Xiaoqi Jia. 2249-2263 [doi]
- ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless CompressionRuibo Fan, Xiangrui Yu, Xinglin Pan, ZeYu Li, Weile Luo, Qiang Wang 0022, Wei Wang 0030, Xiaowen Chu 0001. 2264-2280 [doi]