Abstract is missing.
- Accelerating Number Theoretic Transform with Multi-GPU Systems for Efficient Zero Knowledge ProofZhuoran Ji, Jianyu Zhao, Peimin Gao, Xiangkai Yin, Lei Ju 0001. 1-14 [doi]
- Accelerating Retrieval-Augmented GenerationDerrick Quinn, Mohammad Nouri, Neel Patel, John Salihu, Alireza Salemi, Sukhan Lee 0002, Hamed Zamani, Mohammad Alian. 15-32 [doi]
- AnA: An Attentive Autonomous Driving SystemWonkyo Choe, Rongxiang Wang, Felix Xiaozhu Lin. 33-46 [doi]
- AnyKey: A Key-Value SSD for All Workload TypesChanyoung Park 0004, Jungho Lee, Chun-Yi Liu 0002, Kyungtae Kang, Mahmut Taylan Kandemir, Wonil Choi. 47-63 [doi]
- ARC: Warp-level Adaptive Atomic Reduction in GPUs to Accelerate Differentiable RenderingSankeerth Durvasula, Adrian Zhao, Fan Chen, Ruofan Liang, Pawan Kumar Sanjaya, Yushi Guan, Christina Giannoula, Nandita Vijaykumar. 64-83 [doi]
- Automatic Tracing in Task-Based Runtime SystemsRohan Yadav, Michael Bauer 0001, David Broman, Michael Garland, Alex Aiken, Fredrik Kjolstad. 84-99 [doi]
- BatchZK: A Fully Pipelined GPU-Accelerated System for Batch Generation of Zero-Knowledge ProofsTao Lu, Yuxun Chen, Zonghui Wang, Xiaohang Wang 0014, Wenzhi Chen, Jiaheng Zhang. 100-115 [doi]
- ByteFS: System Support for (CXL-based) Memory-Semantic Solid-State DrivesShaobo Li, Yirui Eric Zhou, Hao Ren, Jian Huang. 116-132 [doi]
- Cinnamon: A Framework for Scale-Out Encrypted AISiddharth Jayashankar, Edward Chen, Tom Tang, Wenting Zheng, Dimitrios Skarlatos 0002. 133-150 [doi]
- ClosureX: Compiler Support for Correct Persistent FuzzingRishi Ranjan, Ian Paterson, Matthew Hicks. 151-163 [doi]
- Coach: Exploiting Temporal Patterns for All-Resource Oversubscription in Cloud PlatformsBenjamin Reidys, Pantea Zardoshti, Íñigo Goiri, Celine Irvene, Daniel S. Berger, Haoran Ma, Kapil Arya, Eli Cortez, Taylor Stark, Eugene Bak, Mehmet Iyigun, Stanko Novakovic, Lisa Hsu, Karel Trueba, Abhisek Pan, Chetan Bansal, Saravan Rajmohan, Jian Huang, Ricardo Bianchini. 164-181 [doi]
- Composing Distributed Computations Through Task and Kernel FusionRohan Yadav, Shiv Sundram, Wonchan Lee, Michael Garland, Michael Bauer 0001, Alex Aiken, Fredrik Kjolstad. 182-197 [doi]
- Concerto: Automatic Communication Optimization and Scheduling for Large-Scale Deep LearningShenggan Cheng, Shengjie Lin, Lansong Diao, Hao Wu, Siyu Wang, Chang Si, Ziming Liu, Xuanlei Zhao, Jiangsu Du, Wei Lin 0016, Yang You 0001. 198-213 [doi]
- Cooperative Graceful Degradation in Containerized CloudsKapil Agrawal, Sangeetha Abdu Jyothi. 214-232 [doi]
- Copper and Wire: Bridging Expressiveness and Performance for Service Mesh PoliciesDivyanshu Saxena, William Zhang 0002, Shankara Pailoor, Isil Dillig, Aditya Akella. 233-248 [doi]
- CRUSH: A Credit-Based Approach for Functional Unit Sharing in Dynamically Scheduled HLSJiahui Xu, Lana Josipovic. 249-263 [doi]
- DarwinGame: Playing Tournaments for Tuning Applications in Noisy Cloud EnvironmentsRohan Basu Roy, Vijay Gadepally, Devesh Tiwari. 264-279 [doi]
- Debugger Toolchain Validation via Cross-Level DebuggingYibiao Yang, Maolin Sun, Jiangchang Wu, Qingyang Li, Yuming Zhou. 280-294 [doi]
- Design and Operation of Shared Machine Learning Clusters on CampusKaiqiang Xu, Decang Sun, Hao Wang 0116, Zhenghang Ren, Xinchen Wan, Xudong Liao, Zilong Wang 0007, Junxue Zhang 0001, Kai Chen 0005. 295-310 [doi]
- Dilu: Enabling GPU Resourcing-on-Demand for Serverless DL Serving via Introspective ElasticityCunchi Lv, Xiao Shi 0003, Zhengyu Lei, Jinyue Huang, Wenting Tan, Xiaohui Zheng, Xiaofang Zhao. 311-325 [doi]
- D-VSync: Decoupled Rendering and Displaying for Smartphone GraphicsYuanpei Wu, Dong Du 0003, Chao Xu, Yubin Xia, Ming Fu, Binyu Zang, Haibo Chen 0001. 326-341 [doi]
- Early Termination for Hyperdimensional Computing Using Inferential StatisticsPu (Luke) Yi 0001, Yifan Yang, Chae-Young Lee, Sara Achour. 342-360 [doi]
- Earth+: On-Board Satellite Imagery Compression Leveraging Historical Earth ObservationsKuntai Du, Yihua Cheng, Peder A. Olsen, Shadi A. Noghabi, Junchen Jiang. 361-376 [doi]
- EDM: An Ultra-Low Latency Ethernet Fabric for Memory DisaggregationWeigao Su, Vishal Shrivastav. 377-394 [doi]
- Efficient Lossless Compression of Scientific Floating-Point Data on CPUs and GPUsNoushin Azami, Alex Fallin, Martin Burtscher. 395-409 [doi]
- Enhancing CGRA Efficiency Through Aligned Compute and Communication ProvisioningZhaoying Li, Pranav Dangi, Chenyang Yin, Thilini Kaushalya Bandara, Rohan Juneja, Cheng Tan 0002, Zhenyu Bai, Tulika Mitra. 410-425 [doi]
- Exo 2: Growing a Scheduling LanguageYuka Ikarashi, Kevin Qian, Samir Droubi, Alex Reinking, Gilbert Louis Bernstein, Jonathan Ragan-Kelley. 426-444 [doi]
- Fast On-device LLM Inference with NPUsDaliang Xu, Hao Zhang 0108, Liming Yang, Ruiqi Liu, Gang Huang 0001, Mengwei Xu, Xuanzhe Liu. 445-462 [doi]
- Faster Chaitin-like Register Allocation via Grammatical Decompositions of Control-Flow GraphsXuran Cai, Amir Kafshdar Goharshady, S. Hitarth, Chun Kit Lam. 463-477 [doi]
- FleetIO: Managing Multi-Tenant Cloud Storage with Multi-Agent Reinforcement LearningJinghan Sun, Benjamin Reidys, Daixuan Li, Jichuan Chang, Marc Snir, Jian Huang 0006. 478-492 [doi]
- Forecasting GPU Performance for Deep Learning Training and InferenceSeonho Lee, Amar Phanishayee, Divya Mahajan 0001. 493-508 [doi]
- Frugal: Efficient and Economic Embedding Model Training with Commodity GPUsMinhui Xie, Shaoxun Zeng, Hao Guo, Shiwei Gao, Youyou Lu. 509-523 [doi]
- FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts ModelsXinglin Pan, Wenxiang Lin, Lin Zhang, Shaohuai Shi, Zhenheng Tang, Rui Wang, Bo Li 0001, Xiaowen Chu 0001. 524-539 [doi]
- Fusion: An Analytics Object Store Optimized for Query PushdownJianan Lu, Ashwini Raina, Asaf Cidon, Michael J. Freedman. 540-556 [doi]
- GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline ParallelismByungSoo Jeon, Mengdi Wu, Shiyi Cao, Sunghyun Kim, Sunghyun Park 0004, Neeraj Aggarwal, Colin Unger, Daiyaan Arfeen, Peiyuan Liao, Xupeng Miao, Mohammad Alizadeh, Gregory R. Ganger, Tianqi Chen 0001, Zhihao Jia. 557-571 [doi]
- HALO: Loop-aware Bootstrapping Management for Fully Homomorphic EncryptionSeonyoung Cheon, Yongwoo Lee 0001, Hoyun Youm, Dongkwan Kim 0002, Sungwoo Yun, Kunmo Jeong, Dongyoon Lee, Hanjun Kim 0001. 572-585 [doi]
- Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-FlowYixuan Mei, Yonghao Zhuang 0001, Xupeng Miao, Juncheng Yang, Zhihao Jia, Rashmi Vinayak. 586-602 [doi]
- H-Houdini: Scalable Invariant LearningSushant Dinesh, Yongye Zhu, Christopher W. Fletcher. 603-618 [doi]
- Instruction-Aware Cooperative TLB and Cache Replacement PoliciesDimitrios Chasapis, Georgios Vavouliotis, Daniel A. Jiménez, Marc Casas. 619-636 [doi]
- Marionette: A RowHammer Attack via Row CouplingSeungmin Baek, Minbok Wi, Seonyong Park, Hwayong Nam, Michael Jaemin Kim, Nam Sung Kim, Jung Ho Ahn. 637-652 [doi]
- Medusa: Accelerating Serverless LLM Inference with MaterializationShaoxun Zeng, Minhui Xie, Shiwei Gao, Youmin Chen, Youyou Lu. 653-668 [doi]
- MetaSapiens: Real-Time Neural Rendering with Efficiency-Aware Pruning and Accelerated Foveated RenderingWeikai Lin, Yu Feng 0007, Yuhao Zhu 0001. 669-682 [doi]
- Mint: Cost-Efficient Tracing with All Requests Collection via Commonality and Variability AnalysisHaiyu Huang 0002, Cheng Chen, Kunyi Chen, Pengfei Chen 0002, Guangba Yu, Zilong He, Yilun Wang, Huxing Zhang, Qi Zhou. 683-697 [doi]
- MOAT: Securely Mitigating Rowhammer with Per-Row Activation CountersMoinuddin Qureshi, Salman Qazi. 698-714 [doi]
- MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUsShiyi Cao, Shu Liu, Tyler Griggs, Peter Schafhalter, Xiaoxuan Liu, Ying Sheng 0007, Joseph E. Gonzalez, Matei Zaharia, Ion Stoica. 715-730 [doi]
- MVQ: Towards Efficient DNN Compression and Acceleration with Masked Vector QuantizationShuaiting Li, Chengxuan Wang, Juncan Deng, Zeyu Wang, Zewen Ye, Zongsheng Wang, Haibin Shen, Kejie Huang. 731-745 [doi]
- Nazar: Monitoring and Adapting ML Models on Mobile DevicesWei Hao, Zixi Wang, Lauren Hong, Lingxiao Li, Nader Karayanni, AnMei Dasbach-Prisk, Chengzhi Mao, Junfeng Yang, Asaf Cidon. 746-761 [doi]
- Optimizing Datalog for the GPUYihao Sun, Ahmedur Rahman Shovon, Thomas Gilray, Sidharth Kumar, Kristopher K. Micinski. 762-776 [doi]
- Optimizing Quantum Circuits, Fast and SlowAmanda Xu, Abtin Molavi, Swamit Tannu, Aws Albarghouthi. 777-793 [doi]
- PartIR: Composing SPMD Partitioning Strategies for Machine LearningSami Alabed, Daniel Belov, Bart Chrzaszcz, Juliana Franco, Dominik Grewe, Dougal Maclaurin, James Molloy, Tom Natan, Tamara Norman, Xiaoyue Pan, Adam Paszke, Norman A. Rink, Michael Schaarschmidt, Timur Sitdikov, Agnieszka Swietlik, Dimitrios Vytiniotis, Joel Wee. 794-810 [doi]
- PCcheck: Persistent Concurrent Checkpointing for MLFoteini Strati, Michal Friedman 0001, Ana Klimovic. 811-827 [doi]
- Performance Prediction of On-NIC Network Functions with Multi-Resource Contention and Traffic AwarenessShaofeng Wu, Qiang Su, Zhixiong Niu, Hong Xu 0001. 828-842 [doi]
- PipeLLM: Fast and Confidential Large Language Model Services with Speculative Pipelined EncryptionYifan Tan 0005, Cheng Tan 0005, Zeyu Mi, Haibo Chen 0001. 843-857 [doi]
- pulse: Accelerating Distributed Pointer-Traversals on Disaggregated MemoryYupeng Tang, Seung-Seob Lee, Abhishek Bhattacharjee, Anurag Khandelwal. 858-875 [doi]
- QECC-Synth: A Layout Synthesizer for Quantum Error Correction Codes on Sparse ArchitecturesKeyi Yin, Hezi Zhang, Xiang Fang, Yunong Shi, Travis S. Humble, Ang Li 0006, Yufei Ding. 876-890 [doi]
- RANGE-BLOCKS: A Synchronization Facility for Domain-Specific ArchitecturesAnagha Molakalmur Anil Kumar, Aditya Prasanna, Arrvindh Shriraman. 891-906 [doi]
- RASSM: Residue-based Acceleration of Single Sparse Matrix Computation via Adaptive TilingAnirudh Jain, Pulkit Gupta, Thomas M. Conte. 907-923 [doi]
- ReSBM: Region-based Scale and Minimal-Level Bootstrapping Management for FHE via Min-CutYan Liu, Jianxin Lai, Long Li, Tianxiang Sui, Linjie Xiao, Peng Yuan, Xiaojing Zhang, Qing Zhu, Wenguang Chen, Jingling Xue. 924-939 [doi]
- Rethinking Java Performance AnalysisStephen M. Blackburn, Zixian Cai, Rui Chen, Xi Yang 0021, John Zhang, John N. Zigman. 940-954 [doi]
- Robustness Verification for Checking Crash Consistency of Non-volatile MemoryZhilei Han, Fei He 0001. 955-969 [doi]
- RTL Verification for Secure Speculation Using Contract Shadow LogicQinhan Tan, Yuheng Yang, Thomas Bourgeat, Sharad Malik, Mengjia Yan 0001. 970-986 [doi]
- Segue & ColorGuard: Optimizing SFI Performance and Scalability on Modern ArchitecturesShravan Narayan, Tal Garfinkel, Evan Johnson 0001, Zachary Yedidia, Yingchen Wang, Andrew Brown, Anjo Vahldiek-Oberwagner, Michael LeMay, Wenyong Huang, Xin Wang, Mingqiu Sun, Dean M. Tullsen, Deian Stefan. 987-1002 [doi]
- Selectively Uniform Concurrency TestingHuan Zhao, Dylan Wolff, Umang Mathur 0001, Abhik Roychoudhury. 1003-1019 [doi]
- SmoothE: Differentiable E-Graph ExtractionYaohui Cai, Kaixin Yang, Chenhui Deng, Cunxi Yu, Zhiru Zhang. 1020-1034 [doi]
- SuperNoVA: Algorithm-Hardware Co-Design for Resource-Aware SLAMSeah Kim, Roger Hsiao, Borivoje Nikolic, James Demmel, Yakun Sophia Shao. 1035-1051 [doi]
- Tally: Non-Intrusive Performance Isolation for Concurrent Deep Learning WorkloadsWei Zhao, Anand Jayarajan, Gennady Pekhimenko. 1052-1068 [doi]
- Target-Aware Implementation of Real ExpressionsBrett Saiki, Jackson Brough, Jonas Regehr, Jesús Ponce, Varun Pradeep, Aditya Akhileshwaran, Zachary Tatlock, Pavel Panchekha. 1069-1083 [doi]
- Tela: A Temporal Load-Aware Cloud Virtual Disk Placement SchemeDifan Tan, Jiawei Li, Hua Wang 0008, Xiaoxiao Li, Wenbo Liu, Zijin Qin, Ke Zhou 0001, Ming Xie, Mengling Tao. 1084-1100 [doi]
- UniZK: Accelerating Zero-Knowledge Proof with Unified Hardware and Flexible Kernel MappingCheng Wang, Mingyu Gao. 1101-1117 [doi]
- Using Analytical Performance/Power Model and Fine-Grained DVFS to Enhance AI Accelerator Energy EfficiencyZibo Wang, Yijia Zhang, Fuchun Wei, Bingqiang Wang, Yanlin Liu, Zhiheng Hu, Jingyi Zhang, Xiaoxin Xu, Jian He, Xiaoliang Wang 0001, Wanchun Dou, Guihai Chen, Chen Tian 0001. 1118-1132 [doi]
- vAttention: Dynamic Memory Management for Serving LLMs without PagedAttentionRamya Prabhu, Ajay Nayak, Jayashree Mohan, Ramachandran Ramjee, Ashish Panwar. 1133-1150 [doi]
- ZRAID: Leveraging Zone Random Write Area (ZRWA) for Alleviating Partial Parity Tax in ZNS RAIDMinwook Kim, Seongyeop Jeong, Jin-Soo Kim 0001. 1151-1165 [doi]