Abstract is missing.
- Clockhands: Rename-free Instruction Set Architecture for Out-of-order ProcessorsToru Koizumi 0001, Ryota Shioya, Shu Sugita, Taichi Amano, Yuya Degawa, Junichiro Kadomoto, Hidetsugu Irie, Shuichi Sakai. 1-16 [doi]
- Decoupled Vector RunaheadAjeya Naithani, Jaime Roelandts, Sam Ainsworth 0001, Timothy M. Jones 0001, Lieven Eeckhout. 17-31 [doi]
- CryptoMMU: Enabling Scalable and Secure Access Control of Third-Party AcceleratorsFaiz Alam, Hyokeun Lee, Abhishek Bhattacharjee, Amro Awad. 32-48 [doi]
- Phantom: Exploiting Decoder-detectable MispredictionsJohannes Wikner, Daniël Trujillo, Kaveh Razavi. 49-61 [doi]
- AuRORA: Virtualized Accelerator Orchestration for Multi-Tenant WorkloadsSeah Kim, Jerry Zhao, Krste Asanovic, Borivoje Nikolic, Yakun Sophia Shao. 62-76 [doi]
- UNICO: Unified Hardware Software Co-Optimization for Robust Neural Network AccelerationBahador Rashidi, Chao Gao, Shan Lu, Zhisheng Wang, ChunHua Zhou, Di Niu, Fengyu Sun. 77-90 [doi]
- Spatula: A Hardware Accelerator for Sparse Matrix FactorizationAxel Feldmann, Daniel Sánchez 0003. 91-104 [doi]
- Demystifying CXL Memory with Genuine CXL-Ready Systems and DevicesYan Sun, Yifan Yuan, Zeduo Yu, Reese Kuper, Chihun Song, Jinghan Huang, Houxiang Ji, Siddharth Agarwal, Jiaqi Lou, Ipoom Jeong, Ren Wang 0001, Jung Ho Ahn, Tianyin Xu, Nam Sung Kim. 105-121 [doi]
- Memento: Architectural Support for Ephemeral Memory Management in Serverless EnvironmentsZiqi Wang 0007, Kaiyang Zhao 0002, Pei Li, Andrew Jacob, Michael Kozuch, Todd C. Mowry, Dimitrios Skarlatos 0002. 122-136 [doi]
- Simultaneous and Heterogenous MultithreadingKuan-Chieh Hsu, Hung-Wei Tseng 0001. 137-152 [doi]
- Accelerating RTL Simulation with Hardware-Software Co-DesignFares Elsabbagh, Shabnam Sheikhha, Victor A. Ying, Quan M. Nguyen, Joel S. Emer, Daniel Sánchez 0003. 153-166 [doi]
- Fast, Robust and Transferable Prediction for Hardware Logic SynthesisCeyu Xu, Pragya Sharma, Tianshu Wang, Lisa Wu Wills. 167-179 [doi]
- Khronos: Fusing Memory Access for Improved Hardware RTL SimulationKexing Zhou, Yun Liang 0001, Yibo Lin, Runsheng Wang, Ru Huang. 180-193 [doi]
- SecureLoop: Design Space Exploration of Secure DNN AcceleratorsKyungmi Lee, Mengjia Yan 0001, Joel S. Emer, Anantha P. Chandrakasan. 194-208 [doi]
- DOSA: Differentiable Model-Based One-Loop Search for DNN AcceleratorsCharles Hong, Qijing Huang 0001, Grace Dinh, Mahesh Subedar, Yakun Sophia Shao. 209-224 [doi]
- TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUsHaotian Tang, Shang Yang, Zhijian Liu, Ke Hong, Zhongming Yu, Xiuyu Li, Guohao Dai, Yu Wang 0002, Song Han 0003. 225-239 [doi]
- Branch Target Buffer OrganizationsArthur Perais, Rami Sheikh. 240-253 [doi]
- Warming Up a Cold Front-End with IgniteDavid Schall, Andreas Sandberg, Boris Grot. 254-267 [doi]
- ArchExplorer: Microarchitecture Exploration Via Bottleneck AnalysisChen Bai, Jiayi Huang 0001, Xuechao Wei, Yuzhe Ma, Sicheng Li, Hongzhong Zheng, Bei Yu 0001, Yuan Xie 0001. 268-282 [doi]
- DF-GAS: a Distributed FPGA-as-a-Service Architecture towards Billion-Scale Graph-based Approximate Nearest Neighbor SearchShulin Zeng, Zhenhua Zhu, Jun Liu, Haoyu Zhang, Guohao Dai, Zixuan Zhou, Shuangchen Li, Xuefei Ning, Yuan Xie, Huazhong Yang, Yu Wang. 283-296 [doi]
- Dadu-RBD: Robot Rigid Body Dynamics Accelerator with Multifunctional PipelinesYuxin Yang, Xiaoming Chen, Yinhe Han 0001. 297-309 [doi]
- MEGA Evolving Graph AcceleratorChao Gao, Mahbod Afarin, Shafiur Rahman, Nael B. Abu-Ghazaleh, Rajiv Gupta 0001. 310-323 [doi]
- Eureka: Efficient Tensor Cores for One-sided Unstructured Sparsity in DNN InferenceAshish Gondimalla, Mithuna Thottethodi, T. N. Vijaykumar. 324-337 [doi]
- RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse AccelerationGuyue Huang, Zhengyang Wang, Po-An Tsai, Chen Zhang, Yufei Ding, Yuan Xie. 338-352 [doi]
- Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN WorkloadsHongxiang Fan, Stylianos I. Venieris, Alexandros Kouris, Nicholas D. Lane. 353-366 [doi]
- MAD MAcce: Supporting Multiply-Add Operations for Democratizing Matrix-Multiplication AcceleratorsSeunghwan Sung, Sujin Hur, SungWoo Kim, Dongho Ha, Yunho Oh, Won Woo Ro. 367-379 [doi]
- Path Forward Beyond Simulators: Fast and Accurate GPU Execution Time Prediction for DNN WorkloadsYing Li, Yifan Sun 0003, Adwait Jog. 380-394 [doi]
- G10: Enabling An Efficient Unified GPU Memory and Storage Architecture with Smart Tensor MigrationsHaoyang Zhang, Yirui Eric Zhou, Yuqi Xue, Yiqi Liu, Jian Huang 0006. 395-410 [doi]
- MAICC : A Lightweight Many-core Architecture with In-Cache Computing for Multi-DNN Parallel InferenceRenhao Fan, Yikai Cui, Qilin Chen, Mingyu Wang, Youhui Zhang, Weimin Zheng, Zhaolin Li. 411-423 [doi]
- Cambricon-U: A Systolic Random Increment Memory Architecture for Unary ComputingHongrui Guo, Yongwei Zhao, Zhangmai Li, Yifan Hao, Chang Liu, Xinkai Song, Xiaqing Li, Zidong Du, Rui Zhang, Qi Guo, Tianshi Chen 0002, Zhiwei Xu 0002. 424-437 [doi]
- Improving Data Reuse in NPU On-chip Memory with Interleaved Gradient Order for DNN TrainingJungwoo Kim 0002, Seonjin Na, Sanghyeon Lee, Sunho Lee, Jaehyuk Huh. 438-451 [doi]
- TT-GNN: Efficient On-Chip Graph Neural Network Training via Embedding Reformation and Hardware OptimizationZheng Qu, Dimin Niu, Shuangchen Li, Hongzhong Zheng, Yuan Xie 0008. 452-464 [doi]
- Supporting Energy-based Learning with an Ising Machine substrate: a Case Study on RBMUday Kumar Reddy Vengalam, Yongchao Liu, Tong Geng, Hui Wu, Michael C. Huang 0001. 465-478 [doi]
- QuComm: Optimizing Collective Communication for Distributed Quantum ComputingAnbang Wu, Yufei Ding, Ang Li. 479-493 [doi]
- QuCT: A Framework for Analyzing Quantum Circuit by Extracting Contextual and Topological FeaturesSiwei Tan, Congliang Lang, Liang Xiang, Shudi Wang, Xinghui Jia, Ziqi Tan, Tingting Li, Jieming Yin, Yongheng Shang, Andre Python, Liqiang Lu, Jianwei Yin. 494-508 [doi]
- ERASER: Towards Adaptive Leakage Suppression for Fault-Tolerant Quantum ComputingSuhas Vittal, Poulami Das 0005, Moinuddin K. Qureshi. 509-525 [doi]
- Systems Architecture for Quantum Random Access MemoryShifan Xu, Connor T. Hann, Ben Foxman, Steven M. Girvin, Yongshan Ding 0001. 526-538 [doi]
- HetArch: Heterogeneous Microarchitectures for Superconducting Quantum SystemsSamuel Alexander Stein, Sara Sussman, Teague Tomesh, Charles Guinn, Esin Tureci, Sophia Fuhui Lin, Wei Tang, James A. Ang, Srivatsan Chakram, Ang Li, Margaret Martonosi, Fred Chong, Andrew A. Houck, Isaac L. Chuang, Michael Austin DeMarco. 539-554 [doi]
- Efficiently Enabling Block Semantics and Data Updates in DNA StoragePuru Sharma, Cheng-Kai Lim, Dehui Lin, Yash Pote, Djordje Jevdjic. 555-568 [doi]
- ReFOCUS: Reusing Light for Efficient Fourier Optics-Based Photonic Neural Network AcceleratorShurui Li, Hangbo Yang, Chee Wei Wong, Volker J. Sorger, Puneet Gupta 0001. 569-583 [doi]
- SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson DevicesZhengang Li, Geng Yuan, Tomoharu Yamauchi, Masoud Zabihi, Yanyue Xie, Peiyan Dong, Xulong Tang, Nobuyuki Yoshikawa, Devesh Tiwari, Yanzhi Wang, Olivia Chen. 584-598 [doi]
- SuperBP: Design Space Exploration of Perceptron-Based Branch Predictors for Superconducting CPUsHaipeng Zha, Swamit Tannu, Murali Annavaram. 599-613 [doi]
- SUSHI: Ultra-High-Speed and Ultra-Low-Power Neuromorphic Chip Using Superconducting Single-Flux-Quantum CircuitsZeshi Liu, Shuo Chen, Peiyao Qu, Huanli Liu, Minghui Niu, Liliang Ying, Jie Ren, Guangming Tang, Haihang You. 614-627 [doi]
- AQ2PNN: Enabling Two-party Privacy-Preserving Deep Neural Network Inference with Adaptive QuantizationYukui Luo, Nuo Xu, Hongwu Peng, Chenghong Wang, Shijin Duan, Kaleel Mahmood, Wujie Wen, Caiwen Ding, Xiaolin Xu. 628-640 [doi]
- CHERIoT: Complete Memory Safety for Embedded DevicesSaar Amar, David Chisnall, Tony Chen, Nathaniel Wesley Filardo, Ben Laurie, Kunyan Liu, Robert M. Norton, Simon W. Moore, Yucong Tao, Robert N. M. Watson, Hongyan Xia. 641-653 [doi]
- Accelerating Extra Dimensional Page Walks for Confidential ComputingDong Du 0003, Bicheng Yang, Yubin Xia, Haibo Chen 0001. 654-669 [doi]
- GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic EncryptionKaustubh Shivdikar, Yuhui Bao, Rashmi Agrawal, Michael Tian Shen, Gilbert Jonatan, Evelio Mora, Alexander Ingare, Neal Livesay, José L. Abellán, John Kim, Ajay Joshi, David R. Kaeli. 670-684 [doi]
- MAD: Memory-Aware Design Techniques for Accelerating Fully Homomorphic EncryptionRashmi Agrawal 0001, Leo de Castro, Chiraag Juvekar, Anantha P. Chandrakasan, Vinod Vaikuntanathan, Ajay Joshi. 685-697 [doi]
- Micro-Armed Bandit: Lightweight & Reusable Reinforcement Learning for Microarchitecture Decision-MakingGerasimos Gerogiannis, Josep Torrellas. 698-713 [doi]
- CLIP: Load Criticality based Data Prefetching for Bandwidth-constrained Many-core SystemsBiswabandan Panda. 714-727 [doi]
- Snake: A Variable-length Chain-based Prefetching for GPUsSaba Mostofi, Hajar Falahati, Negin Mahani, Pejman Lotfi-Kamran, Hamid Sarbazi-Azad. 728-741 [doi]
- Treelet Prefetching For Ray TracingYuan-Hsi Chou, Tyler Nowicki, Tor M. Aamodt. 742-755 [doi]
- NAS-SE: Designing A Highly-Efficient In-Situ Neural Architecture Search Engine for Large-Scale DeploymentQiyu Wan, Lening Wang, Jing Wang, Shuaiwen Leon Song, Xin Fu. 756-768 [doi]
- XFM: Accelerated Software-Defined Far MemoryNeel Patel, Amin Mamandipoor, Derrick Quinn, Mohammad Alian. 769-783 [doi]
- Affinity Alloc: Taming Not-So Near-Data ComputingZhengrong Wang, Christopher Liu, Nathan Beckmann, Tony Nowatzki. 784-799 [doi]
- MVC: Enabling Fully Coherent Multi-Data-Views through the Memory Hierarchy with Processing in MemoryDaichi Fujiki. 800-814 [doi]
- AESPA: Asynchronous Execution Scheme to Exploit Bank-Level Parallelism of Processing-in-MemoryHongju Kal, Chanyoung Yoo, Won Woo Ro. 815-827 [doi]
- ReCon: Efficient Detection, Management, and Use of Non-Speculative Information LeakagePavlos Aimoniotis, Amund Bergland Kvalsvik, Xiaoyue Chen, Magnus Själander, Stefanos Kaxiras. 828-842 [doi]
- Uncore Encore: Covert Channels Exploiting Uncore Frequency ScalingYanan Guo, Dingyuan Cao, Xin Xin 0008, Youtao Zhang, Jun Yang 0002. 843-855 [doi]
- Hardware Support for Constant-Time ProgrammingYuanqing Miao, Mahmut Taylan Kandemir, Danfeng Zhang, Yingtian Zhang, Gang Tan, Dinghao Wu. 856-870 [doi]
- AutoCC: Automatic Discovery of Covert Channels in Time-Shared HardwareMarcelo Orenes-Vera, Hyunsung Yun, Nils Wistoff, Gernot Heiser, Luca Benini, David Wentzlaff, Margaret Martonosi. 871-885 [doi]
- NeuroLPM - Scaling Longest Prefix Match Hardware with Neural NetworksAlon Rashelbach, Igor Lima de Paula, Mark Silberstein. 886-899 [doi]
- Space MicrodatacentersNathaniel Bleier, Muhammad Husnain Mubarik, Gary R. Swenson, Rakesh Kumar 0002. 900-915 [doi]
- LogNIC: A High-Level Performance Model for SmartNICsZerui Guo, Jiaxin Lin, Yuebin Bai, Daehyeok Kim, Michael M. Swift, Aditya Akella, Ming Liu. 916-929 [doi]
- Heterogeneous Die-to-Die Interfaces: Enabling More Flexible Chiplet Interconnection SystemsYinxiao Feng, Dong Xiang, Kaisheng Ma. 930-943 [doi]
- Predicting Future-System Reliability with a Component-Level DRAM Fault ModelJeageun Jung, Mattan Erez. 944-956 [doi]
- Impact of Voltage Scaling on Soft Errors Susceptibility of Multicore Server CPUsDimitris Agiakatsikas, George Papadimitriou 0001, Vasileios Karakostas, Dimitris Gizopoulos, Mihalis Psarakis, Camille Bélanger-Champagne, Ewart Blackmore. 957-971 [doi]
- Si-Kintsugi: Towards Recovering Golden-Like Performance of Defective Many-Core Spatial Architectures for AIEdward Hanson, Shiyu Li, Guanglei Zhou, Feng Cheng, Yitu Wang, Rohan Bose, Hai Li, Yiran Chen 0001. 972-985 [doi]
- How to Kill the Second Bird with One ECC: The Pursuit of Row Hammer Resilient DRAMMichael Jaemin Kim, Minbok Wi, Jaehyun Park 0006, Seoyoung Ko, Jaeyoung Choi, Hwayong Nam, Nam Sung Kim, Jung Ho Ahn, Eojin Lee. 986-1001 [doi]
- Bucket Getter: A Bucket-based Processing Engine for Low-bit Block Floating Point (BFP) DNNsYun-Chen Lo, Ren-Shuo Liu. 1002-1015 [doi]
- ACRE: Accelerating Random Forests for ExplainabilityAndrew McCrabb, Aymen Ahmed, Valeria Bertacco. 1016-1028 [doi]
- δLTA: Decoupling Camera Sampling from Processing to Avoid Redundant Computations in the Vision PipelineRaúl Taranco, José María Arnau, Antonio González 0001. 1029-1043 [doi]
- McCore: A Holistic Management of High-Performance Heterogeneous MulticoresJaewon Kwon, Yongju Lee, Hongju Kal, Minjae Kim, Youngsok Kim, Won Woo Ro. 1044-1058 [doi]
- SweepCache: Intermittence-Aware Cache on the CheapYuchen Zhou, Jianping Zeng 0001, Jungi Jeong, Jongouk Choi, Changhee Jung. 1059-1074 [doi]
- Persistent Processor ArchitectureJianping Zeng 0001, Jungi Jeong, Changhee Jung. 1075-1091 [doi]
- ADA-GP: Accelerating DNN Training By Adaptive Gradient PredictionVahid Janfaza, Shantanu Mandal, Farabi Mahmud, Abdullah Muzahid. 1092-1105 [doi]
- HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured SparsityYannan Nellie Wu, Po-An Tsai, Saurav Muralidharan, Angshuman Parashar, Vivienne Sze, Joel S. Emer. 1106-1120 [doi]
- Exploiting Inherent Properties of Complex Numbers for Accelerating Complex Valued Neural NetworksHyunwuk Lee, Hyungjun Jang, Sungbin Kim, SungWoo Kim, Wonho Cho, Won Woo Ro. 1121-1134 [doi]
- Point Cloud Acceleration by Exploiting Geometric SimilarityCen Chen, Xiaofeng Zou, Hongen Shao, Yangfan Li, Kenli Li 0001. 1135-1147 [doi]
- HARP: Hardware-Based Pseudo-Tiling for Sparse Matrix Multiplication AcceleratorJinkwon Kim, Myeongjae Jang, Haejin Nam, Soontae Kim. 1148-1162 [doi]
- IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE InvalidationsBingyao Li, Yanan Guo, Yueqi Wang, Aamer Jaleel, Jun Yang, Xulong Tang. 1163-1177 [doi]
- Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache ResourcesKonstantinos Kanellopoulos, Hong Chul Nam, Nisa Bostanci, Rahul Bera, Mohammad Sadrosadati, Rakesh Kumar 0003, Davide Basilio Bartolini, Onur Mutlu. 1178-1195 [doi]
- Utopia: Fast and Efficient Address Translation via Hybrid Restrictive & Flexible Virtual-to-Physical Address MappingsKonstantinos Kanellopoulos, Rahul Bera, Kosta Stojiljkovic, F. Nisa Bostanci, Can Firtina, Rachata Ausavarungnirun, Rakesh Kumar 0003, Nastaran Hajinazar, Mohammad Sadrosadati, Nandita Vijaykumar, Onur Mutlu. 1196-1212 [doi]
- Architectural Support for Optimizing Huge Page Selection Within the OSAninda Manocha, Zi Yan, Esin Tureci, Juan L. Aragón, David W. Nellans, Margaret Martonosi. 1213-1226 [doi]
- Photon: A Fine-grained Sampled Simulation Methodology for GPU WorkloadsChangxi Liu, Yifan Sun, Trevor E. Carlson. 1227-1241 [doi]
- Rigorous Evaluation of Computer Processors with Statistical Model CheckingFilip Mazurek, Arya Tschand, Yu Wang 0044, Miroslav Pajic, Daniel J. Sorin. 1242-1254 [doi]
- TeAAL: A Declarative Framework for Modeling Sparse Tensor AcceleratorsNandeeka Nayak, Toluwanimi O. Odemuyiwa, Shubham Ugare, Christopher W. Fletcher, Michael Pellauer, Joel S. Emer. 1255-1270 [doi]
- TileFlow: A Framework for Modeling Fusion Dataflow via Tree-based AnalysisSize Zheng 0001, Siyuan Chen, Siyuan Gao, Liancheng Jia, Guangyu Sun 0003, Runsheng Wang, Yun Liang 0001. 1271-1288 [doi]
- Learning to Drive Software-Defined Solid-State DrivesDaixuan Li, Jinghan Sun, Jian Huang 0006. 1289-1304 [doi]
- Cambricon-R: A Fully Fused Accelerator for Real-Time Learning of Neural Scene RepresentationXinkai Song, Yuanbo Wen, Xing Hu 0001, Tianbo Liu, Haoxuan Zhou, Husheng Han, Tian Zhi, Zidong Du, Wei Li, Rui Zhang, Chen Zhang, Lin Gao, Qi Guo, Tianshi Chen 0002. 1305-1318 [doi]
- Strix: An End-to-End Streaming Architecture with Two-Level Ciphertext Batching for Fully Homomorphic Encryption with Programmable BootstrappingAdiwena Putra, Prasetiyo, Yi Chen, John Kim, Joo-Young Kim 0001. 1319-1331 [doi]
- A Tensor Marshaling Unit for Sparse Tensor Algebra on General-Purpose ProcessorsMarco Siracusa, Víctor Soria Pardos, Francesco Sgherzi, Joshua Randall 0001, Douglas J. Joseph, Miquel Moretó Planas, Adrià Armejach. 1332-1346 [doi]
- Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer CapacityZi Yu Xue, Yannan Nellie Wu, Joel S. Emer, Vivienne Sze. 1347-1363 [doi]
- Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUsBojian Zheng, Cody Hao Yu, Jie Wang, Yaoyao Ding, Yizhi Liu, Yida Wang, Gennady Pekhimenko. 1364-1380 [doi]
- PockEngine: Sparse and Efficient Fine-tuning in a PocketLigeng Zhu, Lanxiang Hu, Ji Lin 0002, Wei-Ming Chen, Wei-Chen Wang 0002, Chuang Gan, Song Han 0003. 1381-1394 [doi]
- Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow PlaneJinyi Deng, Xinru Tang, Jiahao Zhang, Yuxuan Li, Linyun Zhang, Boxiao Han, Hongjun He, Fengbin Tu, Leibo Liu, Shaojun Wei, Yang Hu 0001, Shouyi Yin. 1395-1408 [doi]
- Pipestitch: An energy-minimal dataflow architecture with lightweight threadsNathan Serafin, Souradip Ghosh, Harsh Desai, Nathan Beckmann, Brandon Lucia. 1409-1422 [doi]
- CASA: An Energy-Efficient and High-Speed CAM-based SMEM Seeding Accelerator for Genome AlignmentYi Huang, Lingkun Kong, Dibei Chen, ZhiYu Chen, Xiangyu Kong, Jianfeng Zhu 0001, Konstantinos Mamouras, Shaojun Wei, Kaiyuan Yang 0001, Leibo Liu. 1423-1436 [doi]
- Swordfish: A Framework for Evaluating Deep Neural Network-based Basecalling using Computation-In-Memory with Non-Ideal MemristorsTaha Shahroodi, Gagandeep Singh 0002, Mahdi Zahedi, Haiyu Mao, Joël Lindegger, Can Firtina, Stephan Wong, Onur Mutlu, Said Hamdioui. 1437-1452 [doi]
- DASH-CAM: Dynamic Approximate SearcH Content Addressable Memory for genome classificationZuher Jahshan, Itay Merlin, Esteban Garzón, Leonid Yavits. 1453-1465 [doi]
- GMX: Instruction Set Extensions for Fast, Scalable, and Efficient Genome Sequence AlignmentMax Doblas, Oscar Lostes-Cazorla, Quim Aguado-Puig, Nick Cebry, Pau Fontova-Musté, Christopher Frances Batten, Santiago Marco-Sola, Miquel Moretó. 1466-1480 [doi]