Abstract is missing.
- APOLLO: An Automated Power Modeling Framework for Runtime Power Introspection in High-Volume Commercial MicroprocessorsZhiyao Xie, Xiaoqing Xu, Matt Walker, Joshua Knebel, Kumaraguru Palaniswamy, Nicolas Hebert, Jiang Hu, Huanrui Yang, Yiran Chen, Shidhartha Das. 1-14 [doi]
- TIP: Time-Proportional Instruction ProfilingBjörn Gottschall, Lieven Eeckhout, Magnus Jahre. 15-27 [doi]
- NDS: N-Dimensional StorageYu-Chia Liu, Hung-Wei Tseng 0001. 28-45 [doi]
- GPS: A Global Publish-Subscribe Model for Multi-GPU Memory ManagementHarini Muthukrishnan, Daniel Lustig, David W. Nellans, Thomas F. Wenisch. 46-58 [doi]
- ParaBit: Processing Parallel Bitwise Operations in NAND Flash Memory based SSDsCongming Gao, Xin Xin, Youyou Lu, Youtao Zhang, Jun Yang, Jiwu Shu. 59-70 [doi]
- Distributed Data PersistencyApostolos Kokolis, Antonis Psistakis, Benjamin Reidys, Jian Huang 0006, Josep Torrellas. 71-85 [doi]
- COSPlay: Leveraging Task-Level Parallelism for High-Throughput Synchronous PersistenceMarina Vemmou, Alexandros Daglis. 86-99 [doi]
- RACER: Bit-Pipelined Processing Using Resistive MemoryMinh S. Q. Truong, Eric Chen, Deanyone Su, Liting Shen, Alexander Glass, L. Richard Carley, James A. Bain, Saugata Ghose. 100-116 [doi]
- LADDER: Architecting Content and Location-aware Writes for Crossbar Resistive MemoriesMd Hafizul Islam Chowdhuryy, Muhammad Rashedul Haq Rashed, Amro Awad, Rickard Ewetz, Fan Yao. 117-130 [doi]
- GreenDIMM: OS-assisted DRAM Power Management for DRAM with a Sub-array Granularity Power-Down StateSeunghak Lee, Ki-Dong Kang, Hwanjun Lee, Hyungwon Park, Young Hoon Son, Nam Sung Kim, Daehoon Kim. 131-142 [doi]
- NMAP: Power Management Based on Network Packet Processing Mode Transition for Latency-Critical WorkloadsKi-Dong Kang, Gyeongseo Park, Hyosang Kim, Mohammad Alian, Nam Sung Kim, Daehoon Kim. 143-154 [doi]
- BurstLink: Techniques for Energy-Efficient Video Display for Conventional and Virtual Reality SystemsJawad Haj-Yahya, Jisung Park 0001, Rahul Bera, Juan Gómez-Luna, Efraim Rotem, Taha Shahroodi, Jeremie S. Kim, Onur Mutlu. 155-169 [doi]
- ReplayCache: Enabling Volatile Cachesfor Energy Harvesting SystemsJianping Zeng, Jongouk Choi, Xinwei Fu, Ajay Paddayuru Shreepathi, Dongyoon Lee, Changwoo Min, Changhee Jung. 170-182 [doi]
- AutoFL: Enabling Heterogeneity-Aware Energy Efficient Federated LearningYoung-geun Kim, Carole-Jean Wu. 183-198 [doi]
- IceClave: A Trusted Execution Environment for In-Storage ComputingLuyi Kang, Yuqi Xue, Weiwei Jia, Xiaohao Wang, Jongryool Kim, Changhwan Youn, Myeong Joon Kang, Hyung-Jin Lim, Bruce L. Jacob, Jian Huang 0006. 199-211 [doi]
- DarKnight: An Accelerated Framework for Privacy and Integrity Preserving Deep Learning Using Trusted HardwareHanieh Hashemi, Yongqin Wang, Murali Annavaram. 212-224 [doi]
- 2-in-1 Accelerator: Enabling Random Precision Switch for Winning Both Adversarial Robustness and EfficiencyYonggan Fu, Yang Zhao, Qixuan Yu, Chaojian Li, Yingyan Lin. 225-237 [doi]
- F1: A Fast and Programmable Accelerator for Fully Homomorphic EncryptionNikola Samardzic, Axel Feldmann, Aleksandar Krastev, Srinivas Devadas, Ronald G. Dreslinski, Christopher Peikert, Daniel Sánchez 0003. 238-252 [doi]
- Cryptographic Capability ComputingMichael LeMay, Joydeep Rakshit, Sergej Deutsch, David M. Durham, Santosh Ghosh, Anant Nori, Jayesh Gaur, Andrew Weiler, Salmin Sultana, Karanvir Grewal, Sreenivas Subramoney. 253-267 [doi]
- TRiM: Enhancing Processor-Memory Interfaces with Scalable Tensor Reduction in MemoryJaehyun Park, Byeongho Kim, Sungmin Yun, Eojin Lee, Minsoo Rhu, Jung Ho Ahn. 268-281 [doi]
- SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory SystemsMaciej Besta, Raghavendra Kanakagiri, Grzegorz Kwasniewski, Rachata Ausavarungnirun, Jakub Beránek, Konstantinos Kanellopoulos, Kacper Janda, Zur Vonarburg-Shmaria, Lukas Gianinazzi, Ioana Stefan, Juan Gómez-Luna, Jakub Golinowski, Marcin Copik, Lukas Kapp-Schwoerer, Salvatore Di Girolamo, Nils Blach, Marek Konieczny, Onur Mutlu, Torsten Hoefler. 282-297 [doi]
- OrderLight: Lightweight Memory-Ordering Primitive for Efficient Fine-Grained PIM ComputationsAnirban Nag, Rajeev Balasubramonian. 298-310 [doi]
- Sunder: Enabling Low-Overhead and Scalable Near-Data Pattern Matching AccelerationElaheh Sadredini, Reza Rahimi, Mohsen Imani, Kevin Skadron. 311-323 [doi]
- SAM: Accelerating Strided Memory AccessesXin Xin, Yanan Guo, Youtao Zhang, Jun Yang. 324-336 [doi]
- Efficient, Distributed, and Non-Speculative Multi-Address Atomic OperationsEduardo José Gómez-Hernández, Juan M. Cebrian, J. Rubén Titos Gil, Stefanos Kaxiras, Alberto Ros. 337-349 [doi]
- Cohmeleon: Learning-Based Orchestration of Accelerator Coherence in Heterogeneous SoCsJoseph Zuckerman, Davide Giri, Jihye Kwon, Paolo Mantovani, Luca P. Carloni. 350-365 [doi]
- Fat Loads: Exploiting Locality Amongst Contemporaneous Load Operations to Optimize Cache AccessesVanshika Baoni, Adarsh Mittal, Gurindar S. Sohi. 366-379 [doi]
- Criticality Driven FetchAniket Deshmukh, Yale N. Patt. 380-391 [doi]
- Software-Defined Vector Processing on Manycore FabricsPhilip Bedoukian, Neil Adit, Edwin Peguero, Adrian Sampson. 392-406 [doi]
- Cerebros: Evading the RPC Tax in DatacentersArash Pourhabibi Zarandi, Mark Sutherland, Alexandros Daglis, Babak Falsafi. 407-420 [doi]
- Equinox: Training (for Free) on a Custom Inference AcceleratorMario Drumond, Louis Coulon, Arash Pourhabibi Zarandi, Ahmet Caner Yüzügüler, Babak Falsafi, Martin Jaggi. 421-433 [doi]
- : Near-Storage Accelerator for High-Performance Log AnalyticsSeongyoung Kang, Jiyoung An, Jinpyo Kim, Sang-Woo Jun. 434-448 [doi]
- PointAcc: Efficient Point Cloud AcceleratorYujun Lin, Zhekai Zhang, Haotian Tang, Hanrui Wang 0002, Song Han 0003. 449-461 [doi]
- A Hardware Accelerator for Protocol BuffersSagar Karandikar, Chris Leary, Chris Kennelly, Jerry Zhao, Dinesh Parimi, Borivoje Nikolic, Krste Asanovic, Parthasarathy Ranganathan. 462-478 [doi]
- Archytas: A Framework for Synthesizing and Dynamically Optimizing Accelerators for Robotic LocalizationWeizhuang Liu, Bo Yu, Yiming Gan, Qiang Liu, Jie Tang 0003, Shaoshan Liu, Yuhao Zhu 0001. 479-493 [doi]
- HoloAR: On-the-fly Optimization of 3D Holographic Processing for Augmented RealityShulin Zhao 0001, Haibo Zhang 0005, Cyan Subhra Mishra, Sandeepa Bhuyan, Ziyu Ying 0001, Mahmut Taylan Kandemir, Anand Sivasubramaniam, Chita R. Das. 494-506 [doi]
- NOVIA: A Framework for Discovering Non-Conventional Inline AcceleratorsDavid Trilla, John-David Wellman, Alper Buyuktosunoglu, Pradip Bose. 507-521 [doi]
- Noema: Hardware-Efficient Template Matching for Neural Population Pattern DetectionAmeer M. S. Abdelhadi, Eugene Sha, Ciaran Bannon, Hendrik Steenland, Andreas Moshovos. 522-534 [doi]
- SquiggleFilter: An Accelerator for Portable Virus DetectionTimothy Dunn, Harisankar Sadasivan, Jack Wadden, Kush Goliya, Kuan-Yu Chen, David T. Blaauw, Reetuparna Das, Satish Narayanasamy. 535-549 [doi]
- UC-Check: Characterizing Micro-operation Caches in x86 Processors and Implications in Security and PerformanceJoonsung Kim, Hamin Jang, Hunjun Lee, Seungho Lee, Jangwoo Kim. 550-564 [doi]
- Network-on-Chip Microarchitecture-based Covert Channel in GPUsJaeguk Ahn, Jiho Kim, Hans Kasan, Zhixian Jin, Leila Delshadtehrani, WonJun Song, Ajay Joshi, John Kim. 565-577 [doi]
- Validation of Side-Channel Models via Observation RefinementPablo Buiras, Hamed Nemati, Andreas Lindner, Roberto Guanciale. 578-591 [doi]
- GhostMinion: A Strictness-Ordered Cache System for Spectre MitigationSam Ainsworth. 592-606 [doi]
- Speculative Privacy Tracking (SPT): Leaking Information From Speculative Execution Without Compromising PrivacyRutvik Choudhary, Jiyong Yu, Christopher W. Fletcher, Adam Morrison 0001. 607-622 [doi]
- HARP: Practically and Effectively Identifying Uncorrectable Errors in Memory Chips That Use On-Die Error-Correcting CodesMinesh Patel, Geraldo Francisco de Oliveira Jr., Onur Mutlu. 623-640 [doi]
- Characterizing and Mitigating Soft Errors in GPU DRAMMichael B. Sullivan 0001, Nirmal R. Saxena, Mike O'Connor, Donghyuk Lee, Paul Racunas, Saurabh Hukerikar, Timothy Tsai 0002, Siva Kumar Sastry Hari, Stephen W. Keckler. 641-653 [doi]
- Turnpike: Lightweight Soft Error Resilience for In-Order CoresJianping Zeng, Hongjune Kim, Jaejin Lee, Changhee Jung. 654-666 [doi]
- Effective Processor Verification with Logic Fuzzer Enhanced Co-simulationNursultan Kabylkas, Tommy Thorn, Shreesha Srinath, Polychronis Xekalakis, Jose Renau. 667-678 [doi]
- Synthesizing Formal Models of Hardware from RTL for Efficient Verification of Memory Model ImplementationsYao Hsiao, Dominic P. Mulligan, Nikos Nikoleris, Gustavo Petri, Caroline Trippel. 679-694 [doi]
- Ohm-GPU: Integrating New Optical Network and Heterogeneous Memory into GPU Multi-ProcessorsJie Zhang 0048, Myoungsoo Jung. 695-708 [doi]
- Intersection Prediction for Accelerated GPU Ray TracingLufei Liu, Wesley Chang, Francois Demoullin, Yuan-Hsi Chou, Mohammadreza Saed, David Pankratz, Tyler Nowicki, Tor M. Aamodt. 709-723 [doi]
- Principal Kernel Analysis: A Tractable Methodology to Simulate Scaled GPU WorkloadsCesar Avalos Baddouh, Mahmoud Khairy, Roland N. Green, Mathias Payer, Timothy G. Rogers. 724-737 [doi]
- AccelWattch: A Power Modeling Framework for Modern GPUsVijay Kandiah, Scott Peverelle, Mahmoud Khairy, Junrui Pan, Amogh Manjunath, Timothy G. Rogers, Tor M. Aamodt, Nikos Hardavellas. 738-753 [doi]
- Vortex: Extending the RISC-V ISA for GPGPU and 3D-GraphicsBlaise Tine, Krishna Praveen Yalamarthy, Fares Elsabbagh, Hyesoon Kim. 754-766 [doi]
- Enabling Branch-Mispredict Level Parallelism by Selectively Flushing InstructionsStijn Eyerman, Wim Heirman, Sam Van den Steen, Ibrahim Hur. 767-778 [doi]
- PDede: Partitioned, Deduplicated, Delta Branch Target BufferNiranjan K. Soundararajan, Peter Braun, Tanvir Ahmed Khan, Baris Kasikci, Heiner Litz, Sreenivas Subramoney. 779-791 [doi]
- Leveraging Targeted Value Prediction to Unlock New Hardware Strength Reduction PotentialArthur Perais. 792-803 [doi]
- Branch Runahead: An Alternative to Branch Prediction for Impossible to Predict BranchesStephen Pruett, Yale N. Patt. 804-815 [doi]
- Twig: Profile-Guided BTB Prefetching for Data Center ApplicationsTanvir Ahmed Khan, Nathan Brown, Akshitha Sriraman, Niranjan K. Soundararajan, Rakesh Kumar 0002, Joseph Devietti, Sreenivas Subramoney, Gilles A. Pokam, Heiner Litz, Baris Kasikci. 816-829 [doi]
- EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP InferenceThierry Tambe, Coleman Hooper, Lillian Pentecost, Tianyu Jia, En-Yu Yang, Marco Donato, Victor Sanh, Paul N. Whatmough, Alexander M. Rush, David Brooks 0001, Gu-Yeon Wei. 830-844 [doi]
- HiMA: A Fast and Scalable History-based Memory Access Engine for Differentiable Neural ComputerYaoyu Tao, Zhengya Zhang. 845-856 [doi]
- FPRaker: A Processing Element For Accelerating Neural Network TrainingOmar Mohamed Awad, Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh, Ciaran Bannon, Anand Jayarajan, Gennady Pekhimenko, Andreas Moshovos. 857-869 [doi]
- RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and PerformanceUdit Gupta, Samuel Hsia, Jeff Zhang 0001, Mark Wilkening, Javin Pombra, Hsien-Hsin Sean Lee, Gu-Yeon Wei, Carole-Jean Wu, David Brooks 0001. 870-884 [doi]
- Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern RetrievingQiyu Wan, Haojun Xia, Xingyao Zhang, Lening Wang, Shuaiwen Leon Song, Xin Fu. 885-897 [doi]
- Exploiting Different Levels of Parallelism in the Quantum Control Microarchitecture for Superconducting QubitsMengyu Zhang, Lei Xie, Zhenxing Zhang, Qiaonian Yu, Guanglei Xi, Hualiang Zhang, Fuming Liu, Yarui Zheng, Yicong Zheng, Shengyu Zhang. 898-911 [doi]
- SMART: A Heterogeneous Scratchpad Memory Architecture for Superconductor SFQ-based Systolic CNN AcceleratorsFarzaneh Zokaee, Lei Jiang 0001. 912-924 [doi]
- AutoBraid: A Framework for Enabling Efficient Surface Code Communication in Quantum ComputingFei Hua, Yan Hao Chen, Yuwei Jin, Chi Zhang, Ari B. Hayes, Youtao Zhang, Eddy Z. Zhang. 925-936 [doi]
- JigSaw: Boosting Fidelity of NISQ Programs via Measurement SubsettingPoulami Das 0005, Swamit S. Tannu, Moinuddin K. Qureshi. 937-949 [doi]
- ADAPT: Mitigating Idling Errors in Qubits via Adaptive Dynamical DecouplingPoulami Das 0005, Swamit S. Tannu, Siddharth Dangwal, Moinuddin K. Qureshi. 950-962 [doi]
- Distilling Bit-level Sparsity Parallelism for General Purpose Deep Learning AccelerationHang Lu, Liang Chang, Chenglong Li, Zixuan Zhu, Shengjian Lu, Yanhuan Liu, Mingzhe Zhang. 963-976 [doi]
- Sanger: A Co-Design Framework for Enabling Sparse Attention using Reconfigurable ArchitectureLiqiang Lu, Yicheng Jin, Hangrui Bi, Zizhang Luo, Peng Li, Tao Wang, Yun Liang 0001. 977-991 [doi]
- ESCALATE: Boosting the Efficiency of Sparse CNN Accelerator with Kernel DecompositionShiyu Li, Edward Hanson, Xuehai Qian, Hai (Helen) Li, Yiran Chen. 992-1004 [doi]
- SparseAdapt: Runtime Control for Sparse Linear Algebra on a Reconfigurable AcceleratorSubhankar Pal, Aporva Amarnath, Siying Feng, Michael F. P. O'Boyle, Ronald G. Dreslinski, Christophe Dubach. 1005-1021 [doi]
- Capstan: A Vector RDA for SparsityAlexander Rucker, Matthew Vilim, Tian Zhao 0001, Yaqi Zhang 0001, Raghu Prabhakar, Kunle Olukotun. 1022-1035 [doi]
- Improving Streaming Graph Processing Performance using Input KnowledgeAbanti Basak, Zheng Qu, Jilan Lin, Alaa R. Alameldeen, Zeshan Chishti, Yufei Ding, Yuan Xie 0001. 1036-1050 [doi]
- I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through IslandizationTong Geng, Chunshu Wu, Yongan Zhang, Cheng Tan 0002, Chenhao Xie 0001, Haoran You, Martin C. Herbordt, Yingyan Lin, Ang Li. 1051-1063 [doi]
- Fifer: Practical Acceleration of Irregular Applications on Reconfigurable ArchitecturesQuan M. Nguyen, Daniel Sánchez 0003. 1064-1077 [doi]
- Point-X: A Spatial-Locality-Aware Architecture for Energy-Efficient Graph-Based Point-Cloud Deep LearningJie-Fang Zhang, Zhengya Zhang. 1078-1090 [doi]
- JetStream: Graph Analytics on Streaming Data with Event-Driven Hardware AcceleratorShafiur Rahman, Mahbod Afarin, Nael B. Abu-Ghazaleh, Rajiv Gupta 0001. 1091-1105 [doi]
- Trident: Harnessing Architectural Resources for All Page Sizes in x86 ProcessorsVenkat Sri Sai Ram, Ashish Panwar, Arkaprava Basu. 1106-1120 [doi]
- Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement LearningRahul Bera, Konstantinos Kanellopoulos, Anant Nori, Taha Shahroodi, Sreenivas Subramoney, Onur Mutlu. 1121-1137 [doi]
- Morrigan: A Composite Instruction TLB PrefetcherGeorgios Vavouliotis, Lluc Alvarez, Boris Grot, Daniel A. Jiménez, Marc Casas. 1138-1153 [doi]
- Improving Address Translation in Multi-GPUs via Sharing and Spilling aware TLB DesignBingyao Li, Jieming Yin, Youtao Zhang, Xulong Tang. 1154-1168 [doi]
- Increasing GPU Translation Reach by Leveraging Under-Utilized On-Chip ResourcesJagadish B. Kotra, Michael LeBeane, Mahmut T. Kandemir, Gabriel H. Loh. 1169-1181 [doi]
- A Deeper Look into RowHammer's Sensitivities: Experimental Analysis of Real DRAM Chipsand Implications on Future Attacks and DefensesLois Orosa 0001, Abdullah Giray Yaglikçi, Haocong Luo, Ataberk Olgun, Jisung Park 0001, Hasan Hassan, Minesh Patel, Jeremie S. Kim, Onur Mutlu. 1182-1197 [doi]
- Uncovering In-DRAM RowHammer Protection Mechanisms: A New Methodology, Custom RowHammer Patterns, and ImplicationsHasan Hassan, Yahya Can Tugrul, Jeremie S. Kim, Victor van der Veen, Kaveh Razavi, Onur Mutlu. 1198-1213 [doi]
- Soteria: Towards Resilient Integrity-Protected and Encrypted Non-Volatile MemoriesKazi Abu Zubair, Sudhanva Gurumurthi, Vilas Sridharan, Amro Awad. 1214-1226 [doi]
- Bonsai Merkle Forests: Efficiently Achieving Crash Consistency in Secure Persistent MemoryAlexander Freij, Huiyang Zhou, Yan Solihin. 1227-1240 [doi]
- Dolos: Improving the Performance of Persistent Applications in ADR-Supported Secure MemoryXijing Han, James Tuck, Amro Awad. 1241-1253 [doi]
- The Laplace Microarchitecture for Tracking Data Uncertainty and Its Implementation in a RISC-V ProcessorVasileios Tsoutsouras, Orestis Kaparounakis, Bilgesu Arif Bilgin, Chatura Samarakoon, James Timothy Meech, Jan Heck, Phillip Stanley-Marbell. 1254-1269 [doi]
- Post-Fabrication MicroarchitectureChanchal Kumar, Anirudh Seshadri, Aayush Chaudhary, Shubham Bhawalkar, Rohit Singh, Eric Rotenberg. 1270-1281 [doi]
- PCCS: Processor-Centric Contention-aware Slowdown Model for Heterogeneous System-on-ChipsYuanchao Xu 0001, Mehmet Esat Belviranli, Xipeng Shen, Jeffrey S. Vetter. 1282-1295 [doi]
- ITSLF: Inter-Thread Store-to-Load Forwardingin Simultaneous MultithreadingJosué Feliu, Alberto Ros, Manuel E. Acacio, Stefanos Kaxiras. 1296-1308 [doi]
- ENMC: Extreme Near-Memory Classification via Approximate ScreeningLiu Liu, Jilan Lin, Zheng Qu, Yufei Ding, Yuan Xie. 1309-1322 [doi]