Abstract is missing.
- GhOST: a GPU Out-of-Order Scheduling Technique for Stall ReductionIshita Chaturvedi, Bhargav Reddy Godala, Yucan Wu, Ziyang Xu, Konstantinos Iliakis, Panagiotis-Eleftherios Eleftherakis, Sotirios Xydis, Dimitrios Soudris, Tyler Sorensen 0001, Simone Campanoni, Tor M. Aamodt, David I. August. 1-16 [doi]
- AVM-BTB: Adaptive and Virtualized Multi-level Branch Target BufferYunzhe Liu, Xinyu Li, Tingting Zhang, Tianyi Liu, Qi Guo, Fuxin Zhang, Jian Wang. 17-31 [doi]
- The Maya Cache: A Storage-efficient and Secure Fully-associative Last-level CacheAnubhav Bhatla, Navneet, Biswabandan Panda. 32-44 [doi]
- DS-GL: Advancing Graph Learning via Harnessing Nature's Power within Scalable Dynamical SystemsRuibing Song, Chunshu Wu, Chuan Liu 0001, Ang Li 0006, Michael C. Huang 0001, Tong Geng. 45-57 [doi]
- ReAIM: A ReRAM-based Adaptive Ising Machine for Solving Combinatorial Optimization ProblemsHao-Wei Chiang, Chin-Fu Nien, Hsiang-Yun Cheng, Kuei-Po Huang. 58-72 [doi]
- Mirage: An RNS-Based Photonic Accelerator for DNN TrainingCansu Demirkiran, Guowei Yang, Darius Bunandar, Ajay Joshi. 73-87 [doi]
- Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction ExecutionRahul Bera, Adithya Ranganathan, Joydeep Rakshit, Sujit Mahto, Anant V. Nori, Jayesh Gaur, Ataberk Olgun, Konstantinos Kanellopoulos, Mohammad Sadrosadati, Sreenivas Subramoney, Onur Mutlu. 88-102 [doi]
- QuTracer: Mitigating Quantum Gate and Measurement Errors by Tracing Subsets of QubitsPeiyi Li 0002, Ji Liu 0007, Alvin Gonzales, Zain H. Saleem, Huiyang Zhou, Paul D. Hovland. 103-117 [doi]
- Splitwise: Efficient Generative LLM Inference Using Phase SplittingPratyush Patel, Esha Choukse, Chaojie Zhang, Aashaka Shah, Íñigo Goiri, Saeed Maleki, Ricardo Bianchini. 118-132 [doi]
- HiFi-DRAM: Enabling High-fidelity DRAM Research by Uncovering Sense Amplifiers with IC ImagingMichele Marazzi, Tristan Sachsenweger, Flavien Solt, Peng Zeng, Kubo Takashi, Maksym Yarema, Kaveh Razavi. 133-149 [doi]
- Mind the Gap: Attainable Data Movement and Operational Intensity Bounds for Tensor AlgorithmsQijing Huang, Po-An Tsai, Joel S. Emer, Angshuman Parashar. 150-166 [doi]
- A Tale of Two Domains: Exploring Efficient Architecture Design for Truly Autonomous ThingsXiaofeng Hou, Tongqiao Xu, Chao Li 0009, Cheng Xu, Jiacheng Liu 0001, Yang Hu 0001, Jieru Zhao, Jingwen Leng, Kwang-Ting Cheng, Minyi Guo. 167-181 [doi]
- Determining the Minimum Number of Virtual Networks for Different Coherence ProtocolsWeihang Li, Andrés Goens, Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin. 182-197 [doi]
- FEATHER: A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow SwitchingJianming Tong, Anirudh Itagi, Prasanth Chatarasi, Tushar Krishna. 198-214 [doi]
- Waferscale Network SwitchesShuangliang Chen, Saptadeep Pal, Rakesh Kumar 0002. 215-229 [doi]
- The Case For Data Centre HyperloopsGuillem López-Paradís, Isaac M. Hair, Sid Kannan, Roman Rabbat, Parker Murray, Alex Lopes, Rory Zahedi, Winston Zuo, Jonathan Balkind. 230-244 [doi]
- PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM DevicesSi Ung Noh, JungUk Hong, Chaemin Lim, Seongyeon Park, Jeehyun Kim, Hanjun Kim 0001, Youngsok Kim, Jinho Lee. 245-260 [doi]
- Bosehedral: Compiler Optimization for Bosonic Quantum ComputingJunyu Zhou, Yuhao Liu, Yunong Shi, Ali Javadi-Abhari, Gushu Li. 261-276 [doi]
- Tetris: A Compilation Framework for VQA Applications in Quantum ComputingYuwei Jin, Zirui Li, Fei Hua, Tianyi Hao, Huiyang Zhou, Yipeng Huang, Eddy Z. Zhang. 277-292 [doi]
- Atomique: A Quantum Compiler for Reconfigurable Neutral Atom ArraysHanrui Wang 0002, Pengyu Liu, Daniel Bochen Tan, Yilian Liu, Jiaqi Gu 0002, David Z. Pan, Jason Cong, Umut A. Acar, Song Han 0003. 293-309 [doi]
- Suppressing Correlated Noise in Quantum Computers via Context-Aware CompilingAlireza Seif, Haoran Liao, Vinay Tripathi, Kevin Krsulich, Moein Malekakhlagh, Mirko Amico, Petar Jurcevic, Ali Javadi-Abhari. 310-324 [doi]
- A SAT Scalpel for Lattice Surgery: Representation and Synthesis of Subroutines for Surface-Code Fault-Tolerant Quantum ComputingDaniel Bochen Tan, Murphy Yuezhen Niu, Craig Gidney. 325-339 [doi]
- PreSto: An In-Storage Data Preprocessing System for Training Recommendation ModelsYunjae Lee, Hyeseong Kim, Minsoo Rhu. 340-353 [doi]
- pSyncPIM: Partially Synchronous Execution of Sparse Matrix Operations for All-Bank PIM ArchitecturesDaehyeon Baek, Soojin Hwang, Jaehyuk Huh 0001. 354-367 [doi]
- NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data ProcessingYitu Wang, Shiyu Li, Qilin Zheng, Linghao Song, Zongwang Li, Andrew Chang, Hai Li 0001, Yiran Chen 0001. 368-381 [doi]
- Enabling Efficient Large Recommendation Model Training with Near CXL Memory ProcessingHaifeng Liu 0003, Long Zheng 0003, Yu Huang 0013, Jingyi Zhou, Chaoqiang Liu, Runze Wang, Xiaofei Liao, Hai Jin 0001, Jingling Xue. 382-395 [doi]
- Exploiting Similarity Opportunities of Emerging Vision AI Models on Hybrid Bonding ArchitectureZhiheng Yue, Huizheng Wang, Jiahao Fang, Jinyi Deng, Guangyang Lu, Fengbin Tu, Ruiqi Guo, Yuxuan Li, Yubin Qin, Yang Wang 0089, Chao Li 0009, Huiming Han, Shaojun Wei, Yang Hu 0001, Shouyi Yin. 396-409 [doi]
- ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation ModelsYujeong Choi, Jiin Kim, Minsoo Rhu. 410-423 [doi]
- Derm: SLA-aware Resource Management for Highly Dynamic MicroservicesLiao Chen, Shutian Luo, Chenyu Lin, Zizhao Mo, Huanle Xu, Kejiang Ye, Chengzhong Xu 0001. 424-436 [doi]
- SmartOClock: Workload- and Risk-Aware Overclocking in the CloudJovan Stojkovic, Pulkit A. Misra, Íñigo Goiri, Sam Whitlock, Esha Choukse, Mayukh Das, Chetan Bansal, Jason Lee, Zoey Sun, Haoran Qiu, Reed Zimmermann, Savyasachi Samal, Brijesh Warrier, Ashish Raniwala, Ricardo Bianchini. 437-451 [doi]
- Designing Cloud Servers for Lower CarbonJaylen Wang, Daniel S. Berger, Fiodar Kazhamiaka, Celine Irvene, Chaojie Zhang, Esha Choukse, Kali Frost, Rodrigo Fonseca, Brijesh Warrier, Chetan Bansal, Jonathan Stern, Ricardo Bianchini, Akshitha Sriraman. 452-470 [doi]
- EcoFaaS: Rethinking the Design of Serverless Environments for Energy EfficiencyJovan Stojkovic, Nikoleta Iliakopoulou, Tianyin Xu, Hubertus Franke, Josep Torrellas. 471-486 [doi]
- AIO: An Abstraction for Performance Analysis Across Diverse Accelerator ArchitecturesJoseph Rogers, Taha Soliman, Magnus Jahre. 487-500 [doi]
- FireAxe: Partitioned FPGA-Accelerated Simulation of Large-Scale RTL DesignsJoonho Whangbo, Edwin Lim, Chengyi Lux Zhang, Kevin Anderson, Abraham Gonzalez, Raghav Gupta, Nivedha Krishnakumar, Sagar Karandikar, Borivoje Nikolic, Yakun Sophia Shao, Krste Asanovic. 501-515 [doi]
- Harpocrates: Breaking the Silence of CPU Faults through Hardware-in-the-Loop Program GenerationNikos Karystinos, Odysseas Chatzopoulos, George-Marios Fragkoulis, George Papadimitriou 0001, Dimitris Gizopoulos, Sudhanva Gurumurthi. 516-531 [doi]
- The Dataflow Abstract Machine Simulator FrameworkNathan Zhang, Rubens Lacouture, Gina Sohn, Paul Mure, Qizheng Zhang, Fredrik Kjolstad, Kunle Olukotun. 532-547 [doi]
- Tartan: Microarchitecting a Robotic ProcessorMohammad Bakhshalipour, Phillip B. Gibbons. 548-565 [doi]
- Collision Prediction for Robotics AcceleratorsDeval Shah, Tor M. Aamodt. 566-581 [doi]
- BLESS: Bandwidth and Locality Enhanced SMEM Seeding Acceleration for DNA SequencingSeunghee Han, Seungjae Moon, Teokkyu Suh, Jaehoon Heo, Joo-Young Kim 0001. 582-596 [doi]
- QUETZAL: Vector Acceleration Framework for Modern Genome Sequence Analysis AlgorithmsJulian Pavon, Iván Vargas Valdivieso, Carlos Rojas, César Hernández, Mehmet Aslan, Roger Figueras, Yichao Yuan, Joël Lindegger, Mohammed Alser, Francesc Moll, Santiago Marco-Sola, Oguz Ergin, Nishil Talati, Onur Mutlu, Osman S. Unsal, Mateo Valero, Adrián Cristal. 597-612 [doi]
- HAL: Hardware-assisted Load Balancing for Energy-efficient SNIC-Host Cooperative ComputingJinghan Huang 0001, Jiaqi Lou, Srikar Vanavasam, Xinhao Kong, Houxiang Ji, Ipoom Jeong, Danyang Zhuo, Eun-Kyung Lee, Nam Sung Kim. 613-627 [doi]
- NDPBridge: Enabling Cross-Bank Coordination in Near-DRAM-Bank Processing ArchitecturesBoyu Tian, Yiwei Li, Li Jiang, Shuangyu Cai, Mingyu Gao 0001. 628-643 [doi]
- UM-PIM: DRAM-based PIM with Uniform & Shared Memory SpaceYilong Zhao, Mingyu Gao 0003, Fangxin Liu, Yiwei Hu, Zongwu Wang, Han Lin, Jin Li 0002, He Xian, Hanlin Dong, Tao Yang, Naifeng Jing, Xiaoyao Liang, Li Jiang 0002. 644-659 [doi]
- MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage ProcessingNika Mansouri-Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Mao, Joël Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park 0001, Onur Mutlu. 660-677 [doi]
- On Error Correction for Nonvolatile Processing-In-MemoryHüsrev Cilasun, Salonik Resch, Zamshed I. Chowdhury, Masoud Zabihi, Yang Lv, Brandon Zink, Jianping Wang 0006, Sachin S. Sapatnekar, Ulya R. Karpuzcu. 678-692 [doi]
- MetaLeak: Uncovering Side Channels in Secure Processor Architectures Exploiting MetadataMd Hafizul Islam Chowdhuryy, Hao Zheng, Fan Yao. 693-707 [doi]
- sNPU: Trusted Execution Environments on Integrated NPUsErhu Feng, Dahu Feng, Dong Du 0003, Yubin Xia, Haibo Chen 0001. 708-723 [doi]
- Counter-light Memory EncryptionXin Wang, Jagadish Kotra, Alex Jones, Wenjie Xiong, Xun Jian 0002. 724-738 [doi]
- Perspective: A Principled Framework for Pliable and Secure Speculation in Operating SystemsTae Hoon Kim, David Rudo, Kaiyang Zhao 0002, Zirui Neil Zhao, Dimitrios Skarlatos 0002. 739-755 [doi]
- HEAP: A Fully Homomorphic Encryption Accelerator with Parallelized BootstrappingRashmi S. Agrawal 0001, Anantha P. Chandrakasan, Ajay Joshi. 756-769 [doi]
- Scalable, Programmable and Dense: The HammerBlade Open-Source RISC-V ManycoreDai Cheol Jung, Max Ruttenberg, Paul Gao 0001, Scott Davidson 0004, Daniel Petrisko, Kangli Li, Aditya K. Kamath, Lin Cheng, Shaolin Xie, Peitian Pan, Zhongyuan Zhao 0004, Zichao Yue, Bandhav Veluri, Sripathi Muralitharan, Adrian Sampson, Andrew Lumsdaine, Zhiru Zhang, Christopher Batten, Mark Oskin, Dustin Richmond, Michael Bedford Taylor. 770-784 [doi]
- HADES: Hardware-Assisted Distributed Transactions in the Age of Fast Networks and SmartNICsApostolos Kokolis, Antonis Psistakis, Benjamin Reidys, Jian Huang 0006, Josep Torrellas. 785-800 [doi]
- BlitzCoin: Fully Decentralized Hardware Power Management for Accelerator-Rich SoCsMartin Cochet, Karthik Swaminathan, Erik Jens Loscalzo, Joseph Zuckerman, Maico Cassel dos Santos, Davide Giri, Alper Buyuktosunoglu, Tianyu Jia, David Brooks 0001, Gu-Yeon Wei, Kenneth L. Shepard, Luca P. Carloni, Pradip Bose. 801-817 [doi]
- MAD-Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed SystemsSamuel Hsia, Alicia Golden, Bilge Acun, Newsha Ardalani, Zachary Devito, Gu-Yeon Wei, David Brooks 0001, Carole-Jean Wu. 818-833 [doi]
- Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUsYuan Feng, Seonjin Na, Hyesoon Kim, Hyeran Jeon. 834-847 [doi]
- Intel Accelerators Ecosystem: An SoC-Oriented Perspective : Industry ProductYifan Yuan, Ren Wang 0001, Narayan Ranganathan, Nikhil Rao, Sanjay Kumar, Philip Lantz, Vivekananthan Sanjeepan, Jorge Cabrera, Atul Kwatra, Rajesh Sankaran, Ipoom Jeong, Nam Sung Kim. 848-862 [doi]
- Circular Reconfigurable Parallel Processor for Edge Computing : Industrial Product ✶Yuan Li, Jianbin Zhu, Yao Fu, Yu Lei, Toshio Nagata, Ryan Braidwood, Haohuan Fu, Juepeng Zheng, Wayne Luk, Hongxiang Fan. 863-875 [doi]
- Realizing the AMD Exascale Heterogeneous Processor Vision : Industry ProductAlan Smith 0003, Gabriel H. Loh, Michael J. Schulte, Mike Ignatowski, Samuel Naffziger, Mike Mantor, Nathan Kalyanasundharam, Vamsi Alla, Nicholas Malaya, Joseph L. Greathouse, Eric Chapman, Raja Swaminathan. 876-889 [doi]
- TCP: A Tensor Contraction Processor for AI Workloads Industrial ProductHanjoon Kim, Younggeun Choi, Junyoung Park, Byeongwook Bae, Hyunmin Jeong, Sang Min Lee, Jeseung Yeon, Minho Kim, Changjae Park, Boncheol Gu, Changman Lee, Jaeick Bae, SungGyeong Bae, Yojung Cha, Wooyoung Choe, Jonguk Choi, Juho Ha, Hyuck Han, Namoh Hwang, Seokha Hwang, Kiseok Jang, Haechan Je, Hojin Jeon, Jaewoo Jeon, Hyunjun Jeong, Yeonsu Jung, Dongok Kang, Hyewon Kim, Minjae Kim, Muhwan Kim, Sewon Kim, Suhyung Kim, Won Kim, Yong Kim, Youngsik Kim, Younki Ku, Jeong-Ki Lee, Juyun Lee, Kyungjae Lee, Seokho Lee, Minwoo Noh, Hyuntaek Oh, Gyunghee Park, Sanguk Park, Jimin Seo, Jungyoung Seong, June Paik, Nuno P. Lopes, Sungjoo Yoo. 890-902 [doi]
- Cambricon-D: Full-Network Differential Acceleration for Diffusion ModelsWeihao Kong, Yifan Hao, Qi Guo 0001, Yongwei Zhao, Xinkai Song, Xiaqing Li, Mo Zou, Zidong Du, Rui Zhang 0040, Chang Liu 0021, Yuanbo Wen, Pengwei Jin, Xing Hu 0001, Wei Li 0008, Zhiwei Xu 0002, Tianshi Chen 0002. 903-914 [doi]
- Flagger: Cooperative Acceleration for Large-Scale Cross-Silo Federated Learning AggregationXiurui Pan, Yuda An, Shengwen Liang, Bo Mao, Mingzhe Zhang, Qiao Li 0001, Myoungsoo Jung, Jie Zhang 0048. 915-930 [doi]
- Trapezoid: A Versatile Accelerator for Dense and Sparse Matrix MultiplicationsYifan Yang, Joel S. Emer, Daniel Sánchez 0003. 931-945 [doi]
- NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial AcceleratorKaustubh Shivdikar, Nicolas Bohm Agostini, Malith Jayaweera, Gilbert Jonatan, José L. Abellán, Ajay Joshi, John Kim, David R. Kaeli. 946-960 [doi]
- Compiler-Directed Whole-System PersistenceJianping Zeng 0001, Tong Zhang, Changhee Jung. 961-977 [doi]
- Memento: An Adaptive, Compiler-Assisted Register File Cache for GPUsMojtaba Abaie Shoushtary, José María Arnau, Jordi Tubella Murgadas, Antonio González 0001. 978-990 [doi]
- Soter: Analytical Tensor-Architecture Modeling and Automatic Tensor Program Tuning for Spatial AcceleratorsFuyu Wang, Minghua Shen, Yufei Ding, Nong Xiao. 991-1004 [doi]
- ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV CachingYoupeng Zhao, Di Wu, Jun Wang. 1005-1017 [doi]
- Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert InferenceRanggi Hwang, Jianyu Wei, Shijie Cao, Changho Hwang, Xiaohu Tang, Ting Cao, Mao Yang. 1018-1031 [doi]
- MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix PartitionYubin Qin, Yang Wang 0089, Zhiren Zhao, Xiaolong Yang, Yang Zhou, Shaojun Wei, Yang Hu 0001, Shouyi Yin. 1032-1047 [doi]
- Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime RequantizationJungi Lee, Wonbeom Lee, Jaewoong Sim. 1048-1062 [doi]
- Heterogeneous Acceleration Pipeline for Recommendation System TrainingMuhammad Adnan, Yassaman Ebrahimzadeh Maboud, Divya Mahajan 0001, Prashant J. Nair. 1063-1079 [doi]
- LLMCompass: Enabling Efficient Hardware Design for Large Language Model InferenceHengrui Zhang, August Ning, Rohan Baskar Prabhakar, David Wentzlaff. 1080-1096 [doi]
- DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory CommandsHwayong Nam, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park 0006, Chihun Song, Nam Sung Kim, Jung Ho Ahn. 1097-1111 [doi]
- 2: Lazy MemCopy at the Memory ControllerAditya K. Kamath, Simon Peter 0001. 1112-1128 [doi]
- DyLeCT: Achieving Huge-page-like Translation Performance for Hardware-compressed MemoryGagandeep Panwar, Muhammad Laghari, Esha Choukse, Xun Jian 0002. 1129-1143 [doi]
- Native DRAM Cache: Re-architecting DRAM as a Large-Scale Cache for Data CentersYesin Ryu, YooJin Kim, Giyong Jung, Jung Ho Ahn, Jungrae Kim. 1144-1156 [doi]
- PrIDE: Achieving Secure Rowhammer Mitigation with Low-Cost In-DRAM TrackersAamer Jaleel, Gururaj Saileshwar, Stephen W. Keckler, Moinuddin K. Qureshi. 1157-1172 [doi]
- A New Formulation of Neural Data PrefetchingQuang Duong, Akanksha Jain, Calvin Lin. 1173-1187 [doi]
- UDP: Utility-Driven Fetch Directed Instruction PrefetchingSurim Oh, Mingsheng Xu, Tanvir Ahmed Khan, Baris Kasikci, Heiner Litz. 1188-1201 [doi]
- Triangel: A High-Performance, Accurate, Timely On-Chip Temporal PrefetcherSam Ainsworth, Lev Mukhanov. 1202-1216 [doi]
- Alternate Path FetchAniket Anand Deshmukh, Lingzhe Chester Cai, Yale N. Patt. 1217-1229 [doi]
- Alternate Path μ-op Cache PrefetchingSawan Singh, Arthur Perais, Alexandra Jimborean, Alberto Ros 0001. 1230-1245 [doi]
- DACAPO: Accelerating Continuous Learning in Autonomous Systems for Video AnalyticsYoonsung Kim, Changhun Oh, Jinwoo Hwang, Wonung Kim, Seongryong Oh, Yubin Lee 0002, Hardik Sharma, Amir Yazdanbakhsh, Jongse Park. 1246-1261 [doi]
- BlissCam: Boosting Eye Tracking Efficiency with Learned In-Sensor Sparse SamplingYu Feng 0007, Tianrui Ma, Yuhao Zhu 0001, Xuan Zhang 0001. 1262-1277 [doi]
- BitNN: A Bit-Serial Accelerator for K-Nearest Neighbor Search in Point CloudsMeng Han, Liang Wang 0020, Limin Xiao, Hao Zhang, Tianhao Cai, Jiale Xu, Yibo Wu, Chenhao Zhang, Xiangrong Xu. 1278-1292 [doi]
- Cicero: Addressing Algorithmic and Architectural Bottlenecks in Neural Rendering by Radiance Warping and Memory OptimizationsYu Feng 0007, Zihan Liu 0002, Jingwen Leng, Minyi Guo, Yuhao Zhu 0001. 1293-1308 [doi]
- GameStreamSR: Enabling Neural-Augmented Game Streaming on Commodity Mobile PlatformsSandeepa Bhuyan, Ziyu Ying 0001, Mahmut T. Kandemir, Mahanth Gowda, Chita R. Das. 1309-1322 [doi]