Abstract is missing.
- Ten Lessons From Three Generations Shaped Google's TPUv4i : Industrial ProductNorman P. Jouppi, Doe Hyun Yoon, Matthew Ashcraft, Mark Gottscho, Thomas B. Jablin, George Kurian, James Laudon, Sheng Li 0007, Peter C. Ma, Xiaoyu Ma, Thomas Norrie, Nishant Patil, Sushma Prasad, Cliff Young, Zongwei Zhou, David A. Patterson. 1-14 [doi]
- Sparsity-Aware and Re-configurable NPU Architecture for Samsung Flagship Mobile SoCJun-Woo Jang, Sehwan Lee, Dongyoung Kim, Hyunsun Park, Ali Shafiee Ardestani, YeongJae Choi, Channoh Kim, YooJin Kim, Hyeongseok Yu, Hamzah Abdel-Aziz, Jun-Seok Park, Heonsoo Lee, Dongwoo Lee, Myeong-Woo Kim, Hanwoong Jung, Heewoo Nam, Dongguen Lim, SeungWon Lee, Joon Ho Song, Suknam Kwon, Joseph Hassoun, SukHwan Lim, Changkyu Choi. 15-28 [doi]
- Energy Efficiency Boost in the AI-Infused POWER10 ProcessorBrian W. Thompto, Dung Q. Nguyen, José E. Moreira, Ramon Bertran, Hans M. Jacobson, Richard J. Eickemeyer, Rahul M. Rao, Michael Goulet, Marcy Byers, Christopher J. Gonzalez, Karthik Swaminathan, Nagu R. Dhanwada, Silvia M. Müller, Andreas Wagner, Satish Kumar Sadasivam, Robert K. Montoye, William J. Starke, Christian G. Zoellin, Michael S. Floyd, Jeffrey Stuecheli, Nandhini Chandramoorthy, John-David Wellman, Alper Buyuktosunoglu, Matthias Pflanz, Balaram Sinharoy, Pradip Bose. 29-42 [doi]
- Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial ProductSuk-Han Lee, Shinhaeng Kang, Jaehoon Lee, Hyeonsu Kim, Eojin Lee, Seungwoo Seo, Hosang Yoon, SeungWon Lee, Kyounghwan Lim, Hyunsung Shin, Jinhyun Kim, Seongil O, Anand Iyer, David Wang, Kyomin Sohn, Nam Sung Kim. 43-56 [doi]
- Pioneering Chiplet Technology and Design for the AMD EPYC™ and Ryzen™ Processor Families : Industrial ProductSamuel Naffziger, Noah Beck, Thomas Burd, Kevin Lepak, Gabriel H. Loh, Mahesh Subramony, Sean White. 57-70 [doi]
- Zero Inclusion Victim: Isolating Core Caches from Inclusive Last-level Cache EvictionsMainak Chaudhuri. 71-84 [doi]
- Exploiting Page Table Locality for Agile TLB PrefetchingGeorgios Vavouliotis, Lluc Alvarez, Vasileios Karakostas, Konstantinos Nikas, Nectarios Koziris, Daniel A. Jiménez, Marc Casas. 85-98 [doi]
- A Cost-Effective Entangling Prefetcher for InstructionsAlberto Ros, Alexandra Jimborean. 99-111 [doi]
- Don't Forget the I/O When Allocating Your LLCYifan Yuan, Mohammad Alian, Yipeng Wang 0002, Ren Wang 0001, Ilia Kurakin, Charlie Tai, Nam Sung Kim. 112-125 [doi]
- PF-DRAM: A Precharge-Free DRAM StructureNezam Rohbani, Sina Darabi, Hamid Sarbazi-Azad. 126-138 [doi]
- Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained TransfersHarini Muthukrishnan, David W. Nellans, Daniel Lustig, Jeffrey A. Fessler, Thomas F. Wenisch. 139-152 [doi]
- RaPiD: AI Accelerator for Ultra-low Precision Training and InferenceSwagath Venkataramani, Vijayalakshmi Srinivasan, Wei Wang, Sanchari Sen, Jintao Zhang, Ankur Agrawal, Monodeep Kar, Shubham Jain, Alberto Mannari, Hoang Tran, Yulong Li, Eri Ogawa, Kazuaki Ishizaki, Hiroshi Inoue, Marcel Schaal, Mauricio J. Serrano, Jungwook Choi, Xiao Sun, Naigang Wang, Chia-Yu Chen, Allison Allain, James Bonanno, Nianzheng Cao, Robert Casatuta, Matthew Cohen, Bruce M. Fleischer, Michael Guillorn, Howard Haynie, Jinwook Jung, Mingu Kang, Kyu-hyoun Kim, Siyu Koswatta, Sae Kyu Lee, Martin Lutz, Silvia Mueller, Jinwook Oh, Ashish Ranjan, Zhibin Ren, Scot Rider, Kerstin Schelm, Michael Scheuermann, Joel Silberman, Jie Yang, Vidhi Zalani, Xin Zhang, Ching Zhou, Matthew M. Ziegler, Vinay Shah, Moriyoshi Ohara, Pong-Fei Lu, Brian W. Curran, Sunil Shukla, Leland Chang, Kailash Gopalakrishnan. 153-166 [doi]
- REDUCT: Keep it Close, Keep it Cool! : Efficient Scaling of DNN Inference on Multi-core CPUs with Near-Cache ComputeAnant V. Nori, Rahul Bera, Shankar Balachandran, Joydeep Rakshit, Om J. Omer, Avishaii Abuhatzera, Belliappa Kuttanna, Sreenivas Subramoney. 167-180 [doi]
- Communication Algorithm-Architecture Co-Design for Distributed Deep LearningJiayi Huang 0001, Pritam Majumder, Sungkeun Kim, Abdullah Muzahid, Ki Hwan Yum, Eun Jung Kim 0001. 181-194 [doi]
- Vector RunaheadAjeya Naithani, Sam Ainsworth, Timothy M. Jones 0001, Lieven Eeckhout. 195-208 [doi]
- Unlimited Vector Extension with Data Streaming SupportJoao Mario Domingos, Nuno Neves 0002, Nuno Roma, Pedro Tomás. 209-222 [doi]
- Speculative Vectorisation with Selective ReplayPeng Sun, Giacomo Gabrielli, Timothy M. Jones. 223-236 [doi]
- ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM BroadcastWeiyi Sun, Zhaoshi Li, Shouyi Yin, Shaojun Wei, Leibo Liu. 237-250 [doi]
- Sieve: Scalable In-situ DRAM-based Accelerator Designs for Massively Parallel k-mer MatchingLingxi Wu, Rasool Sharifi, Marzieh Lenjani, Kevin Skadron, Ashish Venkat. 251-264 [doi]
- FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN AcceleratorGeng Yuan, Payman Behnam, Zhengang Li, Ali Shafiee, Sheng Lin, Xiaolong Ma, Hang Liu 0001, Xuehai Qian, Mahdi Nazm Bojnordi, Yanzhi Wang, Caiwen Ding. 265-278 [doi]
- BOSS: Bandwidth-Optimized Search Accelerator for Storage-Class MemoryJun Heo 0001, Seung Yul Lee, Sunhong Min, Yeonhong Park, Sungjun Jung, Tae Jun Ham, Jae W. Lee. 279-291 [doi]
- *Rohan Basu Roy, Tirthak Patel, Devesh Tiwari. 292-305 [doi]
- Confidential Serverless Made Efficient with Plug-In EnclavesMingyu Li, Yubin Xia, Haibo Chen 0001. 306-318 [doi]
- Flex: High-Availability Datacenters With Zero Reserved PowerChaojie Zhang, Alok Gautam Kumbhare, Ioannis Manousakis, Deli Zhang, Pulkit A. Misra, Rod Assis, Kyle Woolcock, Nithish Mahalingam, Brijesh Warrier, David Gauthier, Lalu Kunnath, Steve Solomon, Osvaldo Morales, Marcus Fontoura, Ricardo Bianchini. 319-332 [doi]
- BlockMaestro: Enabling Programmer-Transparent Task-based Execution in GPU SystemsAmirAli Abdolrashidi, Hodjat Asghari Esfeden, Ali Jahanshahi, Kaustubh Singh, Nael B. Abu-Ghazaleh, Daniel Wong 0001. 333-346 [doi]
- Opening Pandora's Box: A Systematic Study of New Ways Microarchitecture Can Leak Private DataJose Rodrigo Sanchez Vicarte, Pradyumna Shome, Nandeeka Nayak, Caroline Trippel, Adam Morrison 0001, David Kohlbrenner, Christopher W. Fletcher. 347-360 [doi]
- I See Dead µops: Leaking Secrets via Intel/AMD Micro-Op CachesXida Ren, Logan Moody, Mohammadkazem Taram, Matthew Jordan, Dean M. Tullsen, Ashish Venkat. 361-374 [doi]
- TimeCache: Using Time to Eliminate Cache Side Channels when Sharing SoftwareDivya Ojha, Sandhya Dwarkadas. 375-387 [doi]
- Accelerated Seeding for Genome Sequence Alignment with Enumerated Radix TreesArun Subramaniyan 0001, Jack Wadden, Kush Goliya, Nathan Ozog, Xiao Wu 0002, Satish Narayanasamy, David T. Blaauw, Reetuparna Das. 388-401 [doi]
- Aurochs: An Architecture for Dataflow ThreadsMatthew Vilim, Alexander Rucker, Kunle Olukotun. 402-415 [doi]
- PipeZK: Accelerating Zero-Knowledge Proof with a Pipelined ArchitectureYe Zhang, Shuo Wang 0009, Xian Zhang, Jiangbin Dong, Xingzhong Mao, Fan Long, Cong Wang, Dong Zhou, Mingyu Gao, Guangyu Sun 0003. 416-428 [doi]
- Taming the Zoo: The Unified GraphIt Compiler Framework for Novel ArchitecturesAjay Brahmakshatriya, Emily Furst, Victor A. Ying, Claire Hsu, Changwan Hong, Max Ruttenberg, Yunming Zhang, Dai Cheol Jung, Dustin Richmond, Michael B. Taylor, Julian Shun, Mark Oskin, Daniel Sánchez 0003, Saman P. Amarasinghe. 429-442 [doi]
- Supporting Legacy Libraries on Non-Volatile Memory: A User-Transparent ApproachChencheng Ye, Yuanchao Xu 0001, Xipeng Shen, Xiaofei Liao, Hai Jin 0001, Yan Solihin. 443-455 [doi]
- Execution Dependence Extension (EDE): ISA Support for Eliminating FencesThomas Shull, Ilias Vougioukas, Nikos Nikoleris, Wendy Elsasser, Josep Torrellas. 456-469 [doi]
- Hetero-ViTAL: A Virtualization Stack for Heterogeneous FPGA ClustersYue Zha, Jing Li. 470-483 [doi]
- CODIC: A Low-Cost Substrate for Enabling Custom In-DRAM Functionalities and OptimizationsLois Orosa 0001, Yaohua Wang, Mohammad Sadrosadati, Jeremie S. Kim, Minesh Patel, Ivan Puddu, Haocong Luo, Kaveh Razavi, Juan Gómez-Luna, Hasan Hassan, Nika Mansouri-Ghiasi, Saugata Ghose, Onur Mutlu. 484-497 [doi]
- NVOverlay: Enabling Efficient and Scalable High-Frequency Snapshotting to NVMZiqi Wang, Chul-Hwan Choo, Michael A. Kozuch, Todd C. Mowry, Gennady Pekhimenko, Vivek Seshadri, Dimitrios Skarlatos 0002. 498-511 [doi]
- Rebooting Virtual Memory with MidgardSiddharth Gupta 0003, Atri Bhattacharyya, Yunho Oh, Abhishek Bhattacharjee, Babak Falsafi, Mathias Payer. 512-525 [doi]
- Dvé: Improving DRAM Reliability and Performance On-Demand via Coherent ReplicationAdarsh Patil 0002, Vijay Nagarajan, Rajeev Balasubramonian, Nicolai Oswald. 526-539 [doi]
- Enabling Compute-Communication Overlap in Distributed Deep Learning Training PlatformsSaeed Rashidi, Matthew Denton, Srinivas Sridharan 0002, Sudarshan Srinivasan, Amoghavarsha Suresh, Jade Nie, Tushar Krishna. 540-553 [doi]
- CoSA: Scheduling by Constrained Optimization for Spatial AcceleratorsQijing Huang 0001, Aravind Kalaiah, Minwoo Kang, James Demmel, Grace Dinh, John Wawrzynek, Thomas Norell, Yakun Sophia Shao. 554-566 [doi]
- η-LSTM: Co-Designing Highly-Efficient Large LSTM Training via Exploiting Memory-Saving and Architectural Design OpportunitiesXingyao Zhang, Haojun Xia, Donglin Zhuang, Hao Sun, Xin Fu, Michael B. Taylor, Shuaiwen Leon Song. 567-580 [doi]
- FlexMiner: A Pattern-Aware Accelerator for Graph Pattern MiningXuhao Chen, Tianhao Huang, Shuotao Xu, Thomas Bourgeat, Chanwoo Chung, Arvind. 581-594 [doi]
- PolyGraph: Exposing the Value of Flexibility for Graph Processing AcceleratorsVidushi Dadu, Sihao Liu, Tony Nowatzki. 595-608 [doi]
- Large-Scale Graph Processing on FPGAs with Caches for Thousands of Simultaneous MissesMikhail Asiatici, Paolo Ienne. 609-622 [doi]
- Cost-Efficient Overclocking in Immersion-Cooled DatacentersMajid Jalili 0004, Ioannis Manousakis, Iñigo Goiri, Pulkit A. Misra, Ashish Raniwala, Husam Alissa, Bharath Ramakrishnan, Phillip Tuma, Christian Belady, Marcus Fontoura, Ricardo Bianchini. 623-636 [doi]
- CryoGuard: A Near Refresh-Free Robust DRAM Design for Cryogenic ComputingGyu-hyeon Lee, Seongmin Na, Ilkwon Byun, Dongmoon Min, Jangwoo Kim. 637-650 [doi]
- Superconducting Computing with Alternating Logic ElementsGeorgios Tzimpragos, Jennifer Volk, Alex Wynn, James E. Smith, Timothy Sherwood. 651-664 [doi]
- Failure Sentinels: Ubiquitous Just-in-time Intermittent Computation via Low-cost Hardware Support for Voltage MonitoringHarrison Williams, Michael Moukarzel, Matthew Hicks. 665-678 [doi]
- SPACE: Locality-Aware Processing in Heterogeneous Memory for Personalized RecommendationsHongju Kal, SeokMin Lee, Gun Ko, Won Woo Ro. 679-691 [doi]
- ELSA: Hardware-Software Co-design for Efficient, Lightweight Self-Attention Mechanism in Neural NetworksTae Jun Ham, Yejin Lee 0001, Seong Hoon Seo, Soosung Kim, Hyunji Choi, Sung Jun Jung, Jae W. Lee. 692-705 [doi]
- Cambricon-Q: A Hybrid Architecture for Efficient TrainingYongwei Zhao, Chang Liu, Zidong Du, Qi Guo 0001, Xing Hu 0001, Yimin Zhuang, Zhenxing Zhang, Xinkai Song, Wei Li, Xishan Zhang, Ling Li, Zhiwei Xu, Tianshi Chen. 706-719 [doi]
- TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric NotationLiqiang Lu, Naiqing Guan, Yuyue Wang, Liancheng Jia, Zizhang Luo, Jieming Yin, Jason Cong, Yun Liang 0001. 720-733 [doi]
- Ripple: Profile-Guided Instruction Cache Replacement for Data Center ApplicationsTanvir Ahmed Khan, Dexin Zhang, Akshitha Sriraman, Joseph Devietti, Gilles Pokam, Heiner Litz, Baris Kasikci. 734-747 [doi]
- Quantifying Server Memory Frequency Margin and Using It to Improve Performance in HPC SystemsDa Zhang 0004, Gagandeep Panwar, Jagadish B. Kotra, Nathan DeBardeleben, Sean Blanchard, Xun Jian. 748-761 [doi]
- Revamping Storage Class Memory With Hardware Automated Memory-Over-Storage SolutionJie Zhang 0048, Miryeong Kwon, Donghyun Gouk, Sungjoon Koh, Nam Sung Kim, Mahmut Taylan Kandemir, Myoungsoo Jung. 762-775 [doi]
- NASGuard: A Novel Accelerator Architecture for Robust Neural Architecture Search (NAS) NetworksXingbin Wang, Boyan Zhao, Rui Hou, Amro Awad, Zhihong Tian, Dan Meng. 776-789 [doi]
- NASA: Accelerating Neural Network Design with a NAS ProcessorXiaohan Ma, Chang Si, Ying Wang, Cheng Liu, Lei Zhang. 790-803 [doi]
- PMNet: In-Network Data PersistenceKorakit Seemakhupt, Sihang Liu 0001, Yasas Senevirathne, Muhammad Shahbaz, Samira Manabi Khan. 804-817 [doi]
- Exploiting Long-Distance Interactions and Tolerating Atom Loss in Neutral Atom Quantum ArchitecturesJonathan M. Baker, Andrew Litteken, Casey Duckering, Henry Hoffmann, Hannes Bernien, Frederic T. Chong. 818-831 [doi]
- Software-Hardware Co-Optimization for Computational Chemistry on Superconducting Quantum ProcessorsGushu Li, Yunong Shi, Ali Javadi-Abhari. 832-845 [doi]
- Designing Calibration and Expressivity-Efficient Instruction Sets for Quantum ComputingLingling Lao, Prakash Murali, Margaret Martonosi, Dan Browne. 846-859 [doi]
- Albireo: Energy-Efficient Acceleration of Convolutional Neural Networks via Silicon PhotonicsKyle Shiflett, Avinash Karanth, Razvan C. Bunescu, Ahmed Louri. 860-873 [doi]
- INTROSPECTRE: A Pre-Silicon Framework for Discovery and Analysis of Transient Execution VulnerabilitiesMoein Ghaniyoun, Kristin Barber, Yinqian Zhang, Radu Teodorescu. 874-887 [doi]
- Maya: Using Formal Control to Obfuscate Power Side ChannelsRaghavendra Pradyumna Pothukuchi, Sweta Yamini Pothukuchi, Petros G. Voulgaris, Alexander Schwing, Josep Torrellas. 888-901 [doi]
- Demystifying the System Vulnerability Stack: Transient Fault Effects Across the LayersGeorge Papadimitriou 0001, Dimitris Gizopoulos. 902-915 [doi]
- No-FAT: Architectural Support for Low Overhead Memory Safety ChecksMohamed Tarek Ibn Ziad, Miguel A. Arroyo, Evgeny Manzhosov, Ryan Piersma, Simha Sethumadhavan. 916-929 [doi]
- Ghost Routing to Enable Oblivious Computation on Memory-centric NetworksYeonju Ro, Seongwook Jin, Jaehyuk Huh, John Kim. 930-943 [doi]
- QUAC-TRNG: High-Throughput True Random Number Generation Using Quadruple Row Activation in Commodity DRAM ChipsAtaberk Olgun, Minesh Patel, Abdullah Giray Yaglikçi, Haocong Luo, Jeremie S. Kim, Nisa Bostanci, Nandita Vijaykumar, Oguz Ergin, Onur Mutlu. 944-957 [doi]
- A RISC-V in-network accelerator for flexible high-performance low-power packet processingSalvatore Di Girolamo, Andreas Kurth, Alexandru Calotoiu, Thomas Benz, Timo Schneider, Jakub Beránek, Luca Benini, Torsten Hoefler. 958-971 [doi]
- Leaky Buddies: Cross-Component Covert Channels on Integrated CPU-GPU SystemsSankha Baran Dutta, Hoda Naghibijouybari, Nael B. Abu-Ghazaleh, Andres Marquez, Kevin J. Barker. 972-984 [doi]
- IChannels: Exploiting Current Management Mechanisms to Create Covert Channels in Modern ProcessorsJawad Haj-Yahya, Lois Orosa 0001, Jeremie S. Kim, Juan Gómez-Luna, Abdullah Giray Yaglikçi, Mohammed Alser, Ivan Puddu, Onur Mutlu. 985-998 [doi]
- ZeRØ: Zero-Overhead Resilient Operation Under Pointer Integrity AttacksMohamed Tarek Ibn Ziad, Miguel A. Arroyo, Evgeny Manzhosov, Simha Sethumadhavan. 999-1012 [doi]
- NN-Baton: DNN Workload Orchestration and Chiplet Granularity Exploration for Multichip AcceleratorsZhanhong Tan, Hongyu Cai, Runpei Dong, Kaisheng Ma. 1013-1026 [doi]
- Snafu: An Ultra-Low-Power, Energy-Minimal CGRA-Generation Framework and ArchitectureGraham Gobieski, Ahmet Oguz Atli, Kenneth Mai, Brandon Lucia, Nathan Beckmann. 1027-1040 [doi]
- SARA: Scaling a Reconfigurable Dataflow AcceleratorYaqi Zhang 0001, Nathan Zhang, Tian Zhao 0001, Matt Vilim, Muhammad Shahbaz 0001, Kunle Olukotun. 1041-1054 [doi]
- HASCO: Towards Agile HArdware and Software CO-design for Tensor ComputationQingcheng Xiao, Size Zheng 0001, Bingzhe Wu, Pengcheng Xu, Xuehai Qian, Yun Liang 0001. 1055-1068 [doi]
- SpZip: Architectural Support for Effective Data Compression In Irregular ApplicationsYifan Yang, Joel S. Emer, Daniel Sánchez 0003. 1069-1082 [doi]
- Dual-side Sparse Tensor CoreYang Wang, Chen Zhang, Zhiqiang Xie, Cong Guo 0003, Yunxin Liu, Jingwen Leng. 1083-1095 [doi]
- RingCNN: Exploiting Algebraically-Sparse Ring Tensors for Energy-Efficient CNN-Based Computational ImagingChao-Tsung Huang. 1096-1109 [doi]
- GoSPA: An Energy-efficient High-performance Globally Optimized SParse Convolutional Neural Network AcceleratorChunhua Deng, Yang Sui, Siyu Liao, Xuehai Qian, Bo Yuan 0001. 1110-1123 [doi]