Abstract is missing.
- WSC-LLM: Efficient LLM Service and Architecture Co-exploration for Wafer-scale ChipsZheng Xu, Dehao Kong, Jiaxin Liu, Jinxi Li, Jingxiang Hou, Xu Dai, Chao Li, Shaojun Wei, Yang Hu, Shouyi Yin. 1-17 [doi]
- LightML: A Photonic Accelerator for Efficient General Purpose Machine LearningLiang Liu, Sadra Rahimi Kari, Xin Xin, Nathan Youngblood, Youtao Zhang, Jun Yang 0002. 18-33 [doi]
- FRED: A Wafer-scale Fabric for 3D Parallel DNN TrainingSaeed Rashidi, William Won, Sudarshan Srinivasan, Puneet Gupta, Tushar Krishna. 34-48 [doi]
- PD Constraint-aware Physical/Logical Topology Co-Design for Network on WaferQize Yang, Taiquan Wei, Sihan Guan, Chengran Li, Haoran Shang, Jinyi Deng, Huizheng Wang, Chao Li, Lei Wang, Yan Zhang, Shouyi Yin, Yang Hu. 49-64 [doi]
- Finesse: An Agile Design Framework for Pairing-based Cryptography via Software/Hardware Co-DesignTianwei Pan, Tianao Dai, Jianlei Yang 0001, Hongbin Jing, Yang Su, Zeyu Hao, Xiaotao Jia, Chunming Hu, Weisheng Zhao. 65-77 [doi]
- Cassandra: Efficient Enforcement of Sequential Execution for Cryptographic ProgramsAli Hajiabadi, Trevor E. Carlson. 78-91 [doi]
- FAST: An FHE Accelerator for Scalable-parallelism with Tunable-bitShengyu Fan, Xianglong Deng, Liang Kong, Guiming Shi, Guang Fan, Dan Meng, Rui Hou 0001, Mingzhe Zhang. 92-106 [doi]
- Neo: Towards Efficient Fully Homomorphic Encryption Acceleration using Tensor CoreDian Jiao, Xianglong Deng, Zhiwei Wang, Shengyu Fan, Yi Chen, Dan Meng, Rui Hou 0001, Mingzhe Zhang. 107-121 [doi]
- Heliostat: Harnessing Ray Tracing Accelerators for Page Table WalksYuan Feng, Yuke Li, Jiwon Lee, Won Woo Ro, Hyeran Jeon. 122-136 [doi]
- Forest: Access-aware GPU UVM ManagementMao Lin, Yuan Feng, Guilherme Cox, Hyeran Jeon. 137-152 [doi]
- Avant-Garde: Empowering GPUs with Scaled Numeric FormatsMinseong Gil, Dongho Ha, Simla Burcu Harma, Myung Kuk Yoon, Babak Falsafi, Won Woo Ro, Yunho Oh. 153-165 [doi]
- CoopRT: Accelerating BVH Traversal for Ray Tracing via Cooperative ThreadsYavuz Selim Tozlu, Huiyang Zhou. 166-179 [doi]
- The XOR Cache: A Catalyst for CompressionZhewen Pan 0001, Joshua San Miguel. 180-193 [doi]
- 2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM InferenceCong Li, Yihan Yin, Xintong Wu, Jingchen Zhu, Zhutianya Gao, Dimin Niu, Qiang Wu, Xin Si, Yuan Xie 0001, Chen Zhang, Guangyu Sun 0003. 194-210 [doi]
- Precise exceptions in relaxed architecturesBen Simner, Alasdair Armstrong, Thomas Bauereiss, Brian Campbell 0001, Ohad Kammar, Jean Pichon-Pharabod, Peter Sewell. 211-224 [doi]
- Rethinking Prefetching for Intermittent ComputingGan Fang, Jianping Zeng 0001, Aditya Gupta, Changhee Jung. 225-240 [doi]
- Hardware-aware Calibration Protocol for Quantum ComputersYuchen Zhu, Jinglei Cheng, Boxi Li, Kecheng Liu, Yidong Zhou, Hanrui Wang 0002, Yufei Ding, Zhiding Liang. 241-256 [doi]
- Constant-Rate Entanglement Distillation for Fast Quantum InterconnectsChristopher Pattison, Gefen Baranes, Juan Pablo Bonilla Ataides, Mikhail D. Lukin, Hengyun Zhou. 257-270 [doi]
- S-SYNC: Shuttle and Swap Co-Optimization in Quantum Charge-Coupled DevicesChenghong Zhu, Xian Wu, Jingbo Wang, Xin Wang 0022. 271-284 [doi]
- ARTERY: Fast Quantum Feedback using Branch PredictionWuwei Tian, Liqiang Lu, Siwei Tan, Yun Liang 0001, Tingting Li 0004, Kaiwen Zhou, Xinghui Jia, Jianwei Yin. 285-298 [doi]
- Qtenon: Towards Low-Latency Architecture Integration for Accelerating Hybrid Quantum-Classical ComputingChenning Tao, Liqiang Lu, Size Zheng 0001, Li-Wen Chang, Minghua Shen, Hanyu Zhang, Fangxin Liu, Kaiwen Zhou, Jianwei Yin. 299-312 [doi]
- HiPER: Hierarchically-Composed Processing for Efficient Robot Learning-Based ControlJustin Ting, Minsik Kim 0006, Junkang Zhu, Haotian Sheng, Zhengya Zhang. 313-326 [doi]
- Dadu-Corki: Algorithm-Architecture Co-Design for Embodied AI-powered Robotic ManipulationYiyang Huang, Yuhui Hao, Bo Yu 0014, Feng Yan, Yuxin Yang 0002, Feng Min, Yinhe Han 0001, Lin Ma 0002, Shaoshan Liu, Qiang Liu 0011, Yiming Gan. 327-343 [doi]
- Process Only Where You Look: Hardware and Algorithm Co-optimization for Efficient Gaze-Tracked Foveated Rendering in Virtual RealityHaiyu Wang, Wenxuan Liu, Kenneth Chen, Qi Sun, Sai Qian Zhang. 344-358 [doi]
- RTSpMSpM: Harnessing Ray Tracing for Efficient Sparse Matrix ComputationsHongrui Zhang, Yunan Zhang, Hung-Wei Tseng 0001. 359-373 [doi]
- AQB8: Energy-Efficient Ray Tracing Accelerator through Multi-Level QuantizationYen-Chieh Huang, Chen-Pin Yang, Tsung Tai Yeh. 374-387 [doi]
- ANVIL: An In-Storage Accelerator for Name-Value Data StoresRyan Wong 0001, Nikita Kim, Aniket Das, Kevin Higgs, Engin Ipek, Sapan Agarwal, Saugata Ghose, Ben Feinberg. 388-404 [doi]
- ArtMem: Adaptive Migration in Reinforcement Learning-Enabled Tiered MemoryXinyue Yi, Hongchao Du, Yu Wang, Jie Zhang, Qiao Li, Chun Jason Xue. 405-418 [doi]
- UPP: Universal Predicate Pushdown to Smart StorageIpoom Jeong, Jinghan Huang 0001, Chuxuan Hu, Dohyun Park, Jaeyoung Kang 0004, Nam Sung Kim, Yongjoo Park. 419-433 [doi]
- XHarvest: Rethinking High-Performance and Cost-Efficient SSD Architecture with CXL-Driven HarvestingLi Peng, Wenbo Wu, Shushu Yi, Xianzhang Chen, Chenxi Wang, Shengwen Liang, Zhe Wang, Nong Xiao, Qiao Li, Mingzhe Zhang, Jie Zhang. 434-449 [doi]
- In-Storage Acceleration of Retrieval Augmented Generation as a ServiceRohan Mahapatra, Harsha Santhanam, Christopher Priebe, Hanyang Xu 0002, Hadi Esmaeilzadeh. 450-466 [doi]
- SpecEE: Accelerating Large Language Model Inference with Speculative Early ExitingJiaming Xu, Jiayi Pan, Yongkang Zhou, Siming Chen, Jinhao Li, Yaoxiu Lian, Junyi Wu, Guohao Dai. 467-481 [doi]
- Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache QuantizationMinsu Kim 0004, Seongmin Hong, Ryeowook Ko, Soongyu Choi, Hunjong Lee, Junsoo Kim 0002, Joo-Young Kim 0001, Jongse Park. 482-497 [doi]
- Chimera: Communication Fusion for Hybrid Parallelism in Large Language ModelsLe Qin, Junwei Cui, Weilin Cai, Jiayi Huang. 498-513 [doi]
- LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM InferenceZhiwen Mo, Lei Wang, Jianyu Wei, Zhichen Zeng 0002, Shijie Cao, Lingxiao Ma, Naifeng Jing, Ting Cao, Jilong Xue, Fan Yang 0024, Mao Yang 0004. 514-528 [doi]
- AiF: Accelerating On-Device LLM Inference Using In-Flash ProcessingJaeyong Lee, Hyeunjoo Kim, Sanghun Oh, Myoungjun Chun, Myungsuk Kim, Jihong Kim 0001. 529-543 [doi]
- LIA: A Single-GPU LLM Inference Acceleration with Cooperative AMX-Enabled CPU-GPU Computation and CXL OffloadingHyungyo Kim, Nachuan Wang, Qirong Xia, Jinghan Huang 0001, Amir Yazdanbakhsh, Nam Sung Kim. 544-558 [doi]
- Enabling Ahead Prediction with Practical Energy ConstraintsLingzhe Chester Cai, Aniket Deshmukh, Yale N. Patt. 559-571 [doi]
- Profile-Guided Temporal PrefetchingMengming Li, Qijun Zhang, Yichuan Gao, Wenji Fang, Yao Lu, Yongqing Ren, Zhiyao Xie. 572-585 [doi]
- WarmCache: Exploiting STT-RAM Cache for Low-Power Intermittent SystemsNoureldin Hassan, Byounguk Min, Changhee Jung, Yan Solihin, Jongouk Choi. 586-600 [doi]
- Magellan: A High-Performance Loop-Guided Prefetcher for Indirect Memory AccessGelin Fu, Tian Xia 0008, Mingzhuo Yin, Prashant J. Nair, Mieszko Lis, Pengju Ren. 601-615 [doi]
- Leveraging control-flow similarity to reduce branch predictor cold effects in microservicesHaris Volos 0001, Stylianos Vassiliou, Georgia Antoniou, Davide Basilio Bartolini, Yiannakis Sazeides. 616-630 [doi]
- Cramming a Data Center into One Cabinet, a Co-Exploration of Computing and Hardware Architecture of Waferscale ChipXingmao Yu, Dingcheng Jiang, Jinyi Deng, Jingyao Liu, Chao Li, Shouyi Yin, Yang Hu. 631-645 [doi]
- Fair-CO2: Fair Attribution for Cloud Carbon EmissionsLeo Han, Jash Kakadia, Benjamin C. Lee, Udit Gupta. 646-663 [doi]
- Dynamic Load Balancer in Intel Xeon Scalable Processor: Performance Analyses, Enhancements, and GuidelinesJiaqi Lou, Srikar Vanavasam, Yifan Yuan, Ren Wang 0001, Nam Sung Kim. 664-678 [doi]
- A4: Microarchitecture-Aware LLC Management for Datacenter Servers with Emerging I/O DevicesHaneul Park, Jiaqi Lou, Sangjin Lee 0003, Yifan Yuan, KyoungSoo Park, Yongseok Son, Ipoom Jeong, Nam Sung Kim. 679-693 [doi]
- Single-Address-Space FaaS with JordYuanlong Li, Atri Bhattacharyya, Madhur Kumar, Abhishek Bhattacharjee, Yoav Etsion, Babak Falsafi, Sanidhya Kashyap, Mathias Payer. 694-707 [doi]
- HardHarvest: Hardware-Supported Core Harvesting for MicroservicesJovan Stojkovic, Chunao Liu, Muhammad Shahbaz 0001, Josep Torrellas. 708-722 [doi]
- MoPAC: Efficiently Mitigating Rowhammer with Probabilistic Activation CountingSuhas Vittal, Salman Qazi, Poulami Das 0005, Moinuddin Qureshi. 723-738 [doi]
- When Mitigations Backfire: Timing Channel Attacks and Defense for PRAC-Based RowHammer MitigationsJeonghyun Woo, Joyce Qu, Gururaj Saileshwar, Prashant Jayaprakash Nair. 739-756 [doi]
- PuDHammer: Experimental Analysis of Read Disturbance Effects of Processing-using-DRAM in Real DRAM ChipsIsmail Emir Yuksel, Akash Sood, Ataberk Olgun, Oguzhan Canpolat, Haocong Luo, Nisa Bostanci, Mohammad Sadrosadati, A. Giray Yaglikçi, Onur Mutlu. 757-775 [doi]
- DREAM: Enabling Low-Overhead Rowhammer Mitigation via Directed Refresh ManagementHritvik Taneja, Moin K. Qureshi. 776-792 [doi]
- Ecco: Improving Memory Bandwidth and Capacity for LLMs via Entropy-Aware Cache CompressionFeng Cheng, Cong Guo 0003, Chiyue Wei, Junyao Zhang, Changchun Zhou 0001, Edward Hanson, Jiaqi Zhang, Xiaoxiao Liu 0001, Hai Li 0001, Yiran Chen 0001. 793-807 [doi]
- Hybe: GPU-NPU Hybrid System for Efficient LLM Inference with Million-Token Context WindowSeungjae Moon, Junseo Cha, Hyunjun Park, Joo-Young Kim 0001. 808-820 [doi]
- MeshSlice: Efficient 2D Tensor Parallelism for Distributed DNN TrainingHyoungwook Nam, Gerasimos Gerogiannis, Josep Torrellas. 821-834 [doi]
- Zettafly: A Network Topology with Flexible Non-blocking Regions for Large-scale AI and HPC SystemsDezun Dong, Ziyu Wang, Fei Lei. 835-848 [doi]
- AIM: Software and Hardware Co-design for Architecture-level IR-drop Mitigation in High-performance PIMYuanpeng Zhang, Xing Hu, Xi Chen, Zhihang Yuan, Cong Li, Jingchen Zhu, Zhao Wang, Chenguang Zhang, Xin Si, Wei Gao, Qiang Wu, Runsheng Wang, Guangyu Sun. 849-866 [doi]
- OptiPIM: Optimizing Processing-in-Memory Acceleration Using Integer Linear ProgrammingJiantao Liu, Minxuan Zhou, Yue Pan, Chien-Yi Yang, Lana Josipovic, Tajana Rosing. 867-883 [doi]
- HeterRAG: Heterogeneous Processing-in-Memory Acceleration for Retrieval-augmented GenerationChaoqiang Liu, Haifeng Liu, Dan Chen, Yu Huang 0013, Yi Zhang, Wenjing Xiao, Xiaofei Liao, Hai Jin 0001. 884-898 [doi]
- ATiM: Autotuning Tensor Programs for Processing-in-DRAMYongwon Shin, Dookyung Kang, Hyojin Sung. 899-915 [doi]
- Single Spike Artificial Neural NetworksRhys Gretsch, Michael Beyeler, Jeremy Lau, Timothy Sherwood. 916-929 [doi]
- Phi: Leveraging Pattern-based Hierarchical Sparsity for High-Efficiency Spiking Neural NetworksChiyue Wei, Bowen Duan, Cong Guo 0003, Jingyang Zhang, Qingyue Song, Hai Li 0001, Yiran Chen 0001. 930-943 [doi]
- Bishop: Sparsified Bundling Spiking Transformers on Heterogeneous Cores with Error-constrained PruningBoxun Xu, Yuxuan Yin, Vikram Iyer, Peng Li 0001. 944-957 [doi]
- Hermes: Algorithm-System Co-design for Efficient Retrieval-Augmented Generation At-ScaleMichael Shen, Muhammad Umar 0002, Kiwan Maeng, G. Edward Suh, Udit Gupta. 958-973 [doi]
- RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation ServingWenqi Jiang, Suvinay Subramanian, Cat Graves, Gustavo Alonso, Amir Yazdanbakhsh, Vidushi Dadu. 974-989 [doi]
- Transitive Array: An Efficient GEMM Accelerator with Result ReuseCong Guo 0003, Chiyue Wei, Jiaming Tang, Bowen Duan, Song Han, Hai Li 0001, Yiran Chen 0001. 990-1004 [doi]
- Light-weight Cache Replacement for Instruction Heavy WorkloadsSaba Mostofi, Setu Gupta, Ahmad Hassani, Krishnam Tibrewala, Elvira Teran, Paul V. Gratz, Daniel A. Jiménez. 1005-1019 [doi]
- The Sparsity-Aware LazyGPU ArchitectureChangxi Liu, Miao Yu, Yifan Sun 0002, Trevor E. Carlson. 1020-1034 [doi]
- Evaluating Ruche Networks: Physically Scalable, Cost-Effective, Bandwidth-Flexible NoCsDai Cheol Jung, Michael B. Taylor. 1035-1048 [doi]
- Garibaldi: A Pairwise Instruction-Data Management for Enhancing Shared Last-Level Cache Performance in Server WorkloadsJaewon Kwon, Yongju Lee 0003, Jiwan Kim, Enhyeok Jang, Hongju Kal, Won Woo Ro. 1049-1063 [doi]
- NetCrafter: Tailoring Network Traffic for Non-Uniform Bandwidth Multi-GPU SystemsAmel Fatima, Yang Yang, Yifan Sun, Rachata Ausavarungnirun, Adwait Jog. 1064-1078 [doi]
- Caravan: A Hardware/Software Co-Design for Efficient SIMD Neighbor Search on Point CloudsPedro Henrique Exenberger Becker, Franyell Silfa, José María Arnau, Antonio González 0001. 1079-1092 [doi]
- ANSMET: Approximate Nearest Neighbor Search with Near-Memory Processing and Hybrid Early TerminationYiwei Li, Yuxin Jin, Boyu Tian, Huanchen Zhang, Mingyu Gao 0001. 1093-1107 [doi]
- DReX: Accurate and Scalable Dense Retrieval Acceleration via Algorithmic-Hardware CodesignDerrick Quinn, E. Ezgi Yücel, Martin Prammer, Zhenxing Fan, Kevin Skadron, Jignesh M. Patel, José F. Martínez, Mohammad Alian. 1108-1124 [doi]
- EOD: Enabling Low Latency GNN Inference via Near-Memory Concatenate AggregationTaehwan Kim, Yunki Han, Seohye Ha, Jiwan Kim, Lee-Sup Kim. 1125-1139 [doi]
- RAP: Reconfigurable Automata ProcessorZiyuan Wen, Alexis Le Glaunec, Konstantinos Mamouras, Kaiyuan Yang 0001. 1140-1154 [doi]
- Hybrid SLC-MLC RRAM Mixed-Signal Processing-in-Memory Architecture for Transformer Acceleration via Gradient RedistributionChang Eun Song, Priyansh Bhatnagar, Zihan Xia, Nam Sung Kim, Tajana Rosing, Mingu Kang. 1155-1170 [doi]
- REIS: A High-Performance and Energy-Efficient Retrieval System with In-Storage ProcessingKangqi Chen, Rakesh Nadig, Manos Frouzakis, Nika Mansouri-Ghiasi, Yu Liang 0004, Haiyu Mao, Jisung Park 0001, Mohammad Sadrosadati, Onur Mutlu. 1171-1192 [doi]
- MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling QuantizationAkshat Ramachandran, Souvik Kundu 0009, Tushar Krishna. 1193-1209 [doi]
- Topology-Aware Virtualization over Inter-Core Connected Neural Processing UnitsDahu Feng, Erhu Feng, Dong Du 0003, Pinjie Xu, Yubin Xia, Haibo Chen 0001, Rong Zhao. 1210-1224 [doi]
- Chip Architectures Under Advanced Computing Sanctions✱August Ning, David Wentzlaff. 1225-1239 [doi]
- DiTile-DGNN: An Efficient Accelerator for Distributed Dynamic Graph Neural Network InferenceJiaqi Yang, Hao Zheng 0005, Ahmed Louri. 1240-1253 [doi]
- Cambricon-SR: An Accelerator for Neural Scene Representation with Sparse Encoding TableTianbo Liu 0006, Xinkai Song, Zhifei Yue, Rui Wen, Xing Hu, Zhuoran Song, Yuanbo Wen, Yifan Hao 0001, Wei Li, Zidong Du, Rui Zhang, Jiaming Guo, Di Huang, Shaohui Peng, Guangzhong Sun, Qi Guo, Tianshi Chen 0002. 1254-1268 [doi]
- FATE: Boosting the Performance of Hyper-Dimensional Computing Intelligence with Flexible Numerical DAta TypEHaomin Li 0002, Fangxin Liu, Yichi Chen, Zongwu Wang, Shiyuan Huang, Ning Yang, Dongxu Lyu, Li Jiang 0002. 1269-1282 [doi]
- WindServe: Efficient Phase-Disaggregated LLM Serving with Stream-based Dynamic SchedulingJingqi Feng, Yukai Huang, Rui Zhang, Sicheng Liang, Ming Yan, Jie Wu 0003. 1283-1295 [doi]
- Neoscope: How Resilient Is My SoC to Workload Churn?Joseph Rogers, Lieven Eeckhout, Taha Soliman, Magnus Jahre. 1296-1310 [doi]
- CORD: Low-Latency, Bandwidth-Efficient and Scalable Release Consistency via Directory OrderingYanpeng Yu, Nicolai Oswald, Anurag Khandelwal. 1311-1326 [doi]
- Nyx: Virtualizing dataflow execution on shared FPGA platformsPanagiotis Miliadis, Dimitris Theodoropoulos, Nectarios Koziris, Dionisios N. Pnevmatikatos. 1327-1341 [doi]
- HPVM-HDC: A Heterogeneous Programming System for Accelerating Hyperdimensional ComputingRussel Arbore, Xavier Routh, Abdul Rafae Noor, Akash Kothari, Haichao Yang, Weihong Xu, Sumukh Pinge, Minxuan Zhou, Tajana Rosing, Vikram S. Adve. 1342-1355 [doi]
- UGPU: Dynamically Constructing Unbalanced GPUs for Enhanced Resource EfficiencyXia Zhao, Guangda Zhang, Lu Wang, Huadong Dai. 1356-1369 [doi]
- Synchronization for Fault-Tolerant Quantum ComputersSatvik Maurya, Swamit Tannu. 1370-1385 [doi]
- SWIPER: Minimizing Fault-Tolerant Quantum Program Latency via Speculative Window DecodingJoshua Viszlai, Jason D. Chadwick, Sarang Joshi, Gokul Subramanian Ravi, Yanjing Li, Frederic T. Chong. 1386-1401 [doi]
- CaliQEC: In-situ Qubit Calibration for Surface Code Quantum Error CorrectionXiang Fang, Keyi Yin, Yuchen Zhu, Jixuan Ruan, Dean Tullsen, Zhiding Liang, Andrew Sornborger, Ang Li 0006, Travis S. Humble, Yufei Ding 0001, Yunong Shi. 1402-1416 [doi]
- Variational Quantum Algorithms in the era of Early Fault ToleranceSiddharth Dangwal, Suhas Vittal, Lennart Maximilian Seifert, Frederic T. Chong, Gokul Subramanian Ravi. 1417-1431 [doi]
- Resource Analysis of Low-Overhead Transversal Architectures for Reconfigurable Atom ArraysHengyun Zhou, Casey Duckering, Chen Zhao, Dolev Bluvstein, Madelyn Cain, Aleksander Kubica, Sheng-Tao Wang, Mikhail D. Lukin. 1432-1448 [doi]
- SwitchQNet: Optimizing Distributed Quantum Computing for Quantum Data Centers with Switch NetworksHezi Zhang, Yiran Xu, Haotian Hu, Keyi Yin, Hassan Shapourian, Jiapeng Zhao, Ramana Rao Kompella, Reza Nejabati, Yufei Ding 0001. 1449-1463 [doi]
- Assassyn: A Unified Abstraction for Architectural Simulation and ImplementationJian Weng, Boyang Han, Derui Gao, Ruijie Gao, Wanning Zhang, an Zhong, Ceyu Xu, Jihao Xin, Yangzhixin Luo, Lisa Wu Wills, Marco Canini. 1464-1479 [doi]
- Concorde: Fast and Accurate CPU Performance Modeling with Compositional Analytical-ML FusionArash Nasr-Esfahany, Mohammad Alizadeh, Victor Lee, Hanna Alam, Brett W. Coon, David E. Culler, Vidushi Dadu, Martin Dixon, Henry M. Levy, Santosh Pandey, Parthasarathy Ranganathan, Amir Yazdanbakhsh. 1480-1494 [doi]
- AMALI: An Analytical Model for Accurately Modeling LLM Inference on Modern GPUsShiheng Cao, Junmin Wu, Junshi Chen, Hong An, Zhibin Yu 0001. 1495-1508 [doi]
- GCStack+GCScaler: Fast and Accurate GPU Performance Analyses Using Fine-Grained Stall Cycle Accounting and Interval AnalysisHanna Cha, Sungchul Lee, Jounghoo Lee, Yeonan Ha, Joonsung Kim, Youngsok Kim. 1509-1523 [doi]
- TrioSim: A Lightweight Simulator for Large-Scale DNN Workloads on Multi-GPU SystemsYing Li, Yuhui Bao, Gongyu Wang, Xinxin Mei, Pranav Vaid, Anandaroop Ghosh, Adwait Jog, Darius Bunandar, Ajay Joshi, Yifan Sun 0002. 1524-1538 [doi]
- Accelerating Simulation of Quantum Circuits under Noise via Computational ReuseMeng Wang 0033, Swamit Tannu, Prashant J. Nair. 1539-1553 [doi]
- QPlacer: Frequency-Aware Component Placement for Superconducting Quantum ComputersJunyao Zhang, Hanrui Wang 0002, Qi Ding, Jiaqi Gu 0002, Reouven Assouly, William D. Oliver, Song Han 0003, Kenneth R. Brown, Hai Li 0001, Yiran Chen 0001. 1554-1567 [doi]
- QR-Map: A Map-Based Approach to Quantum Circuit Abstraction for Qubit Reuse OptimizationHyungSeok Kim, Enhyeok Jang, Seungwoo Choi, Youngmin Kim, Won Woo Ro. 1568-1582 [doi]
- Genesis: A Compiler for Hamiltonian Simulation on Hybrid CV-DV Quantum ComputersZihan Chen 0005, Jiakang Li, Minghao Guo, Henry Chen, Zirui Li, Joel Bierman, Yipeng Huang 0001, Huiyang Zhou, Yuan Liu, Eddy Z. Zhang. 1583-1597 [doi]
- Reinforcement Learning-Guided Graph State Generation in Photonic Quantum ComputersYingheng Li, Yue Dai 0005, Aditya Pawar, Rongchao Dong, Jun Yang, Youtao Zhang, Xulong Tang. 1598-1612 [doi]
- HYTE: Flexible Tiling for Sparse Accelerators via Hybrid Static-Dynamic ApproachesXintong Li, Zhiyao Li, Mingyu Gao 0001. 1613-1626 [doi]
- NUPEA: Optimizing Critical Loads on Spatial Dataflow Architectures via Non-Uniform Processing-Element AccessSouradip Ghosh, Graham Gobieski, Keyi Zhang, Brandon Lucia, Nathan Beckmann, Tony Nowatzki. 1627-1640 [doi]
- DX100: Programmable Data Access Accelerator for IndirectionAlireza Khadem, Kamalavasan Kamalakkannan, Zhenyan Zhu, Akash Poptani, Yufeng Gu, Jered Benjamin Dominguez-Trujillo, Nishil Talati, Daichi Fujiki, Scott A. Mahlke, Galen M. Shipman, Reetuparna Das. 1641-1658 [doi]
- SEAL: A Single-Event Architecture for In-Sensor Visual LocalizationRyan Hou, Thomas Twomey, Vasileios Milionis, Evangelos Dikopoulos, Tianrui Ma, Yuhao Zhu 0001, Georgios Tzimpragos. 1659-1674 [doi]
- IDEA-GP: Instruction-Driven Architecture with Efficient Online Workload Allocation for Geometric PerceptionSuquan Zhang, Yu Hu, Yunfei Xiang, Dawei Zhao, Yuanfan Xu, Qingmin Liao, Jincheng Yu, Yu Wang 0002. 1675-1688 [doi]
- Meta's Second Generation AI Chip: Model-Chip Co-Design and Productionization ExperiencesJoel Coburn, Chunqiang Tang, Sameer Abu Asal, Neeraj Agrawal, Raviteja Chinta, Harish Dattatraya Dixit, Brian Dodds, Saritha Dwarakapuram, Amin Firoozshahian, Cao Gao, Kaustubh Gondkar, Tyler Graf, Junhan Hu, Jian Huang, Sterling Hughes, Adam Hutchin, Bhasker Jakka, Guoqiang Jerry Chen, Indu Kalyanaraman, Ashwin Kamath, Pankaj Kansal, Erum Kazi, Roman Levenstein, Mahesh Maddury, Alex Mastro, Siji Medaiyese, Pritesh Modi, Jack Montgomery, Nadathur Satish, Amit Nagpal, Ashwin Narasimha, Maxim Naumov, Eleanor Ozer, JongSoo Park, Poorvaja Ramani, Harikrishna Reddy, David Reiss, Deboleena Roy, Sathish Sekar, Arushi Sharma, Pavan Shetty, Aravind Sukumaran-Rajam, Eran Tal, Mike Tsai, Shreya Varshini, Richard Wareing, Olívia Wu, Xiaolong Xie, Jinghan Yang, Hangchen Yu, Tanmay Zargar, Zitong Zeng, Feixiong Zhang, Ajit Mathews, Xun Jiao, Jiyuan Zhang, Emmanuel Menage, Truls Edvard Stokke, Mohammed Sourouri. 1689-1702 [doi]
- Scaling Llama 3 Training with Efficient Parallelism StrategiesWeiwei Chu, Xinfeng Xie, Jiecao Yu, Jie Wang, Amar Phanishayee, Chunqiang Tang, Yuchen Hao, Jianyu Huang, Mustafa Ozdal, Jun Wang, Vedanuj Goswami, Naman Goyal 0001, Abhishek Kadian, Andrew Gu, Chris Cai, Feng Tian, Xiaodong Wang 0020, Min-Si, Pavan Balaji, Ching-Hsiang Chu, JongSoo Park. 1703-1716 [doi]
- DCPerf: An Open-Source, Battle-Tested Performance Benchmark Suite for Datacenter WorkloadsWei Su, Abhishek Dhanotia, Carlos Torres, Jayneel Gandhi, Neha Gholkar, Shobhit O. Kanaujia, Maxim Naumov, Kalyan Subramanian, Valentin Andrei, Yifan Yuan, Chunqiang Tang. 1717-1730 [doi]
- Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI ArchitecturesChenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Huazuo Gao, Jiashi Li, Liyue Zhang, Panpan Huang, Shangyan Zhou, Shirong Ma, Wenfeng Liang, Ying He, Yuqing Wang, Yuxuan Liu, Y. X. Wei. 1731-1745 [doi]
- Avalanche: Optimizing Cache Utilization via Matrix Reordering for Sparse Matrix Multiplication AcceleratorGwangeun Byeon, Seongwook Kim, Hyungjin Kim, Sukhyun Han, Jinkwon Kim, Prashant Nair, Taewook Kang, Seokin Hong. 1746-1759 [doi]
- Debunking the CUDA Myth Towards GPU-based AI Systems: Evaluation of the Performance and Programmability of Intel's Gaudi NPU for AI Model ServingYunjae Lee, Juntaek Lim, Jehyeon Bang, Eunyeong Cho, Huijong Jeong, Taesu Kim, HyungJun Kim, Joonhyung Lee, Jinseop Im, Ranggi Hwang, Se Jung Kwon, Dongsoo Lee, Minsoo Rhu. 1760-1776 [doi]
- GPUs All Grown-Up: Fully Device-Driven SpMV Using GPU Work GraphsFabian Wildgrube, Pete Ehrett, Paul Trojahn, Richard Membarth, Bradford M. Beckmann, Dominik Baumeister, Matthäus G. Chajdas. 1777-1791 [doi]
- Telos: A Dataflow Accelerator for Sparse Triangular Solver of Partial Differential EquationsXiaochen Hao, Hao Luo, Chu Wang, Chao Yang 0002, Yun Liang 0001. 1792-1805 [doi]
- MagiCache: A Virtual In-Cache Computing EngineRenhao Fan, Yikai Cui, Weike Li, Mingyu Wang, Zhaolin Li. 1806-1818 [doi]
- Folded Banks: 3D-Stacked HBM Design for Fine-Grained Random-Access BandwidthVignesh Adhinarayanan, Bradford M. Beckmann, Wantong Li, Mohammad Seyedzadeh, Sergey Blagodurov, Derrick Aguren, Hayden Hyungdong Lee. 1819-1833 [doi]
- NMP-PaK: Near-Memory Processing Acceleration of Scalable De Novo Genome AssemblyHeewoo Kim, Sanjay Sri Vallabh Singapuram, Haojie Ye, Joseph Izraelevitz, Trevor N. Mudge, Ronald G. Dreslinski, Nishil Talati. 1834-1847 [doi]
- Reconfigurable Stream Network ArchitectureChengyue Wang 0002, Xiaofan Zhang 0001, Jason Cong, James C. Hoe. 1848-1866 [doi]
- DS-TPU: Dynamical System for on-Device Lifelong Graph Learning with Nonlinear Node InteractionChunshu Wu, Ruibing Song, Chuan Liu, Pouya Haghi, Ang Li 0006, Michael Huang, Tony Tong Geng. 1867-1879 [doi]
- TRACI: Network Acceleration of Input-Dynamic Communication for Large-Scale Deep Learning Recommendation ModelGuyue Huang, Hao Li, Le Qin, Jiayi Huang 0001, Yangwook Kang, Yufei Ding 0001, Yuan Xie 0001. 1880-1893 [doi]
- FlexNeRFer: A Multi-Dataflow, Adaptive Sparsity-Aware Accelerator for On-Device NeRF RenderingSeock-Hwan Noh, Banseok Shin, Jeik Choi, Seungpyo Lee, Jaeha Kung, Yeseong Kim. 1894-1909 [doi]
- BingoGCN: Towards Scalable and Efficient GNN Acceleration with Fine-Grained Partitioning and SLTJiale Yan, Hiroaki Ito, Yuta Nagahara, Kazushi Kawamura, Masato Motomura, Thiem Van Chu, Daichi Fujiki. 1910-1924 [doi]
- Lumina: Real-Time Neural Rendering by Exploiting Computational RedundancyYu Feng 0007, Weikai Lin, Yuge Cheng, Zihan Liu, Jingwen Leng, Minyi Guo, Chen Chen 0067, Shixuan Sun, Yuhao Zhu 0001. 1925-1939 [doi]
- LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation QuantizationSeunghee Han, Soongyu Choi, Joo-Young Kim 0001. 1940-1955 [doi]
- MD-pipe: A Strong Scaling Enhanced Pipeline Architecture for Ab Initio Accuracy Molecular DynamicsNing Kang 0007, Guojun Yuan, Zihan Yan, Beining Zhang, Boyang Li, ZeYu Li, Shuo Wang, Guanglei Chen, Jiayi Rao, Zhan Wang, Weile Jia, Ninghui Sun, Guangming Tan. 1956-1968 [doi]
- InfiniMind: A Learning-Optimized Large-Scale Brain-Computer InterfaceYeongwoo Jang, Daye Jung, Seunghyun Song, Hunjun Lee, Jangwoo Kim. 1969-1985 [doi]
- Need for zkSpeed: Accelerating HyperPlonk for Zero-Knowledge ProofsAlhad Daftardar, Jianqiao Mo, Joey Ah-kiow, Benedikt Bünz, Ramesh Karri, Siddharth Garg, Brandon Reagen. 1986-2001 [doi]
- Adaptive CHERI Compartmentalization for Heterogeneous AcceleratorsJianyi Cheng, A. Theodore Markettos, Alexandre Joannou, Paul Metzger, Matthew Naylor, Peter Rugg, Timothy M. Jones 0001. 2002-2016 [doi]
- Unified Memory Protection with Multi-granular MAC and Integrity Tree for Heterogeneous ProcessorsSunho Lee 0003, Seonjin Na, Jeongwon Choi, Jinwon Pyo, Jaehyuk Huh 0001. 2017-2031 [doi]
- SpecASan: Mitigating Transient Execution Attacks Using Speculative Address SanitizationSaber Ganjisaffar, Esmaeil Mohmmadian Koruyeh, Jason Zellmer, Hodjat Asghari Esfeden, Chengyu Song, Nael B. Abu-Ghazaleh. 2032-2045 [doi]