Abstract is missing.
- Direct Spatial Implementation of Sparse Matrix Multipliers for Reservoir ComputingMatthew Denton, Herman Schmit. 1-11 [doi]
- uSystolic: Byte-Crawling Unary Systolic ArrayDi Wu, Joshua San Miguel. 12-24 [doi]
- CAMA: Energy and Memory Efficient Automata Processing in Content-Addressable MemoriesYi Huang, ZhiYu Chen, Dai Li, Kaiyuan Yang 0001. 25-37 [doi]
- CoopMC: Algorithm-Architecture Co-Optimization for Markov Chain Monte Carlo AcceleratorsYuji Chai, Glenn G. Ko, Wei-Te Mark Ting, Luke Bailey, David Brooks 0001, Gu-Yeon Wei. 38-52 [doi]
- Leaky Frontends: Security Vulnerabilities in Processor FrontendsShuwen Deng, Bowen Huang, Jakub Szefer. 53-66 [doi]
- DPrime+DAbort: A High-Precision and Timer-Free Directory-Based Side-Channel Attack in Non-Inclusive Cache Hierarchies using Intel TSXSowoong Kim, Myeonggyun Han, Woongki Baek. 67-81 [doi]
- Abusing Cache Line Dirty States to Leak Information in Commercial ProcessorsYujie Cui, Chun Yang, Xu Cheng. 82-97 [doi]
- unXpec: Breaking Undo-based Safe SpeculationMengming Li, Chenlu Miao, Yilong Yang, Kai Bu. 98-112 [doi]
- Cottage: Coordinated Time Budget Assignment for Latency, Quality and Power Optimization in Web SearchLiang Zhou, Laxmi N. Bhuyan, K. K. Ramakrishnan. 113-125 [doi]
- Enabling Efficient Large-Scale Deep Learning Training with Cache Coherent Disaggregated Memory SystemsZixuan Wang, Joonseop Sim, Euicheol Lim, Jishen Zhao. 126-140 [doi]
- Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized RecommendationLiu Ke, Udit Gupta, Mark Hempstead, Carole-Jean Wu, Hsien-Hsin S. Lee, Xuan Zhang 0001. 141-144 [doi]
- ReTail: Opting for Learning Simplicity to Enable QoS-Aware Power Management in the CloudShuang Chen 0002, Angela Jin, Christina Delimitrou, José F. Martínez. 155-168 [doi]
- ANNA: Specialized Architecture for Approximate Nearest Neighbor SearchYejin Lee, Hyunji Choi, Sunhong Min, Hyunseung Lee, Sangwon Beak, Dawoon Jeong, Jae W. Lee, Tae Jun Ham. 169-183 [doi]
- Hardware-Accelerated Hypergraph Processing with Chain-Driven SchedulingQinggang Wang, Long Zheng 0003, Jingrui Yuan, Yu Huang 0013, Pengcheng Yao, Chuangyi Gui, Ao Hu, Xiaofei Liao, Hai Jin 0001. 184-198 [doi]
- ScalaGraph: A Scalable Accelerator for Massively Parallel Graph ProcessingPengcheng Yao, Long Zheng 0003, Yu Huang 0013, Qinggang Wang, Chuangyi Gui, Zhen Zeng, Xiaofei Liao, Hai Jin 0001, Jingling Xue. 199-212 [doi]
- Adaptive Security Support for Heterogeneous Memory on GPUsShougang Yuan, Amro Awad, Ardhi Wiratama Baskara Yudha, Yan Solihin, Huiyang Zhou. 213-228 [doi]
- TNPU: Supporting Trusted Execution with Tree-less Integrity Protection for Neural Processing UnitSunho Lee, Jungwoo Kim, Seonjin Na, Jongse Park, Jaehyuk Huh. 229-243 [doi]
- SecNDP: Secure Near-Data Processing with Untrusted MemoryWenjie Xiong, Liu Ke, Dimitrije Jankov, Michael Kounavis, Xiaochen Wang, Eric Northup, Jie Amy Yang, Bilge Acun, Carole-Jean Wu, Ping Tak Peter Tang, G. Edward Suh, Xuan Zhang 0001, Hsien-Hsin S. Lee. 244-258 [doi]
- AFS: Accurate, Fast, and Scalable Error-Decoding for Fault-Tolerant Quantum ComputersPoulami Das 0005, Christopher A. Pattison, Srilatha Manne, Douglas M. Carmean, Krysta M. Svore, Moinuddin K. Qureshi, Nicolas Delfosse. 259-273 [doi]
- QULATIS: A Quantum Error Correction Methodology toward Lattice SurgeryYosuke Ueno, Masaaki Kondo, Masamitsu Tanaka, Yasunari Suzuki, Yutaka Tabuchi. 274-287 [doi]
- VAQEM: A Variational Approach to Quantum Error MitigationGokul Subramanian Ravi, Kaitlin N. Smith, Pranav Gokhale, Andrea Mari, Nathan Earnest, Ali Javadi-Abhari, Frederic T. Chong. 288-303 [doi]
- DRIPS: Dynamic Rebalancing of Pipelined Streaming Applications on CGRAsCheng Tan 0002, Nicolas Bohm Agostini, Tong Geng, Chenhao Xie 0001, Jiajia Li 0001, Ang Li, Kevin J. Barker, Antonino Tumeo. 304-316 [doi]
- Parallel Time Batching: Systolic-Array Acceleration of Sparse Spiking Neural ComputationJeong Jun Lee, Wenrui Zhang, Peng Li 0001. 317-330 [doi]
- Near-Stream Computing: General and Transparent Near-Cache AccelerationZhengrong Wang, Jian Weng 0002, Sihao Liu, Tony Nowatzki. 331-345 [doi]
- HyBP: Hybrid Isolation-Randomization Secure Branch PredictorLutan Zhao, Peinan Li, Rui Hou, Michael C. Huang 0001, Xuehai Qian, Lixin Zhang 0002, Dan Meng. 346-359 [doi]
- IR-ORAM: Path Access Type Based Memory Intensity Reduction for Path-ORAMMehrnoosh Raoufi, Youtao Zhang, Jun Yang 0002. 360-372 [doi]
- SafeGuard: Reducing the Security Risk from Row-Hammer via Low-Cost Integrity ProtectionAli Fakhrzadehgan, Yale N. Patt, Prashant J. Nair, Moinuddin K. Qureshi. 373-386 [doi]
- Detecting Qubit-coupling Faults in Ion-trap Quantum ComputersAndrii Maksymov, Jason Nguyen, Vandiver Chaplin, Yun Seong Nam, Igor L. Markov. 387-399 [doi]
- DigiQ: A Scalable Digital Controller for Quantum Computers Using SFQ LogicMohammad Reza Jokar, Richard Rines, Ghasem Pasandi, Haolin Cong, Adam Holmes, Yunong Shi, Massoud Pedram, Frederic T. Chong. 400-414 [doi]
- HiPerRF: A Dual-Bit Dense Storage SFQ Register FileHaipeng Zha, Naveen Kumar Katam, Massoud Pedram, Murali Annavaram. 415-428 [doi]
- ReGNN: A Redundancy-Eliminated Graph Neural Networks AcceleratorCen Chen, Kenli Li 0001, Yangfan Li, Xiaofeng Zou. 429-443 [doi]
- LISA: Graph Neural Network based Portable Mapping on Spatial AcceleratorsZhaoying Li, Dan Wu, Dhananjaya Wijerathne, Tulika Mitra. 444-459 [doi]
- GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-DesignHaoran You, Tong Geng, Yongan Zhang, Ang Li, Yingyan Lin. 460-474 [doi]
- Atomic Dataflow based Graph-Level Workload Orchestration for Scalable DNN AcceleratorsShixuan Zheng, Xianjue Zhang, Leibo Liu, Shaojun Wei, Shouyi Yin. 475-489 [doi]
- Filesystem Encryption or Direct-Access for NVM Filesystems? Let's Have Both!Kazi Abu Zubair, David Mohaisen, Amro Awad. 490-502 [doi]
- Efficient Bad Block Management with Cluster SimilarityJui-Nan Yen, Yao Ching Hsieh, Cheng-Yu Chen, Tseng-Yi Chen, Chia-Lin Yang, Hsiang-Yun Cheng, Yixin Luo. 503-513 [doi]
- Using Psychophysics to Guide Power Adaptation for Input Methods on Mobile ArchitecturesXueliang Li, Shicong Hong, Junyang Chen, Guihai Yan, Kaishun Wu. 514-527 [doi]
- HD-CPS: Hardware-assisted Drift-aware Concurrent Priority Scheduler for Shared Memory MulticoresMohsin Shan, Omer Khan. 528-542 [doi]
- Improving Locality of Irregular Updates with Hardware Assisted Propagation BlockingVignesh Balaji, Brandon Lucia. 543-557 [doi]
- Effective Mimicry of Belady's MIN PolicyIshan Shah, Akanksha Jain, Calvin Lin. 558-572 [doi]
- S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN AccelerationZhi-gang Liu, Paul N. Whatmough, Yuhao Zhu 0001, Matthew Mattina. 573-586 [doi]
- SupermarQ: A Scalable Quantum Benchmark SuiteTeague Tomesh, Pranav Gokhale, Victory Omole, Gokul Subramanian Ravi, Kaitlin N. Smith, Joshua Viszlai, Xin-Chuan Wu, Nikos Hardavellas, Margaret Martonosi, Frederic T. Chong. 587-603 [doi]
- LoopPoint: Checkpoint-driven Sampled Simulation for Multi-threaded ApplicationsAlen Sabu, Harish Patil, Wim Heirman, Trevor E. Carlson. 604-618 [doi]
- Compiler-Driven Simulation of Reconfigurable Hardware AcceleratorsZhijing Li 0002, Yuwei Ye, Stephen Neuendorffer, Adrian Sampson. 619-632 [doi]
- NeuroSync: A Scalable and Accurate Brain Simulator Using Safe and Efficient SpeculationHunjun Lee, Chanmyeong Kim, Minseop Kim, Yujin Chung, Jangwoo Kim. 633-647 [doi]
- Reducing Load Latency with Cache Level PredictionMajid Jalili 0001, Mattan Erez. 648-661 [doi]
- TCOR: A Tile Cache with Optimal ReplacementDiya Joseph, Juan L. Aragón, Joan-Manuel Parcerisa, Antonio González 0001. 662-675 [doi]
- Only Buffer When You Need To: Reducing On-chip GPU Traffic with Reconfigurable Local Atomic BuffersPreyesh Dalmia, Rohan Mahapatra, Matthew D. Sinclair. 676-691 [doi]
- QuantumNAS: Noise-Adaptive Search for Robust Quantum CircuitsHanrui Wang 0002, Yongshan Ding 0001, Jiaqi Gu, Yujun Lin 0001, David Z. Pan, Frederic T. Chong, Song Han 0003. 692-708 [doi]
- Not All SWAPs Have the Same Cost: A Case for Optimization-Aware Qubit RoutingJi Liu, Peiyi Li, Huiyang Zhou. 709-725 [doi]
- Q-GPU: A Recipe of Optimizations for Quantum Circuit Simulation Using GPUsYilun Zhao, Yanan Guo, Yuan Yao, Amanda Dumi, Devin M. Mulvey, Shiv Upadhyay, Youtao Zhang, Kenneth D. Jordan, Jun Yang, Xulong Tang. 726-740 [doi]
- ScaleHLS: A New Scalable High-Level Synthesis Framework on Multi-Level Intermediate RepresentationHanchen Ye, Cong Hao, Jianyi Cheng, Hyunmin Jeong, Jack Huang, Stephen Neuendorffer, Deming Chen. 741-755 [doi]
- HeteroGen: Automatic Synthesis of Heterogeneous Cache Coherence ProtocolsNicolai Oswald, Vijay Nagarajan, Daniel J. Sorin, Vasilis Gavrielatos, Theo Olausson, Reece Carr. 756-771 [doi]
- Reliability-Aware RunaheadAjeya Naithani, Lieven Eeckhout. 772-785 [doi]
- Adaptable Register File Organization for Vector ProcessorsCristóbal Ramírez Lazo, Enrico Reggiani, Carlos Rojas Morales, Roger Figueras Bagué, Luis A. Villa Vargas, Marco Antonio Ramírez Salinas, Mateo Valero Cortés, Osman Sabri Unsal, Adrián Cristal. 786-799 [doi]
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoSHan Zhao 0005, Weihao Cui, Quan Chen 0002, Youtao Zhang, Yanchao Lu, Chao Li 0009, Jingwen Leng, Minyi Guo. 800-813 [doi]
- MAGMA: An Optimization Framework for Mapping Multiple DNNs on Multiple Accelerator CoresSheng-Chun Kao, Tushar Krishna. 814-830 [doi]
- SPACX: Silicon Photonics-based Scalable Chiplet Accelerator for DNN InferenceYuan Li, Ahmed Louri, Avinash Karanth. 831-845 [doi]
- FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic RoundingSai Qian Zhang, Bradley McDanel, H. T. Kung 0001. 846-860 [doi]
- Griffin: Rethinking Sparse Optimization for Deep Learning ArchitecturesJong Hoon Shin, Ali Shafiee, Ardavan Pedram, Hamzah Abdel-Aziz, Ling Li, Joseph Hassoun. 861-875 [doi]
- CANDLES: Channel-Aware Novel Dataflow-Microarchitecture Co-Design for Low Energy Sparse Neural Network AccelerationSumanth Gudaparthi, Sarabjeet Singh, Surya Narayanan, Rajeev Balasubramonian, Visvesh Sathe. 876-891 [doi]
- ASAP: A Speculative Approach to PersistenceSujay Yadalam, Nisarg Shah, Xiangyao Yu, Michael Swift. 892-907 [doi]
- Temporal Exposure Reduction Protection for Persistent MemoryYuanchao Xu 0001, Chencheng Ye, Xipeng Shen, Yan Solihin. 908-924 [doi]
- MULTI-CLOCK: Dynamic Tiering for Hybrid Memory SystemsAdnan Maruf, Ashikee Ghosh, Janki Bhimani, Daniel Campello, Andy Rudoff, Raju Rangaswami. 925-937 [doi]
- NVMExplorer: A Framework for Cross-Stack Comparisons of Embedded Non-Volatile MemoriesLillian Pentecost, Alexander Hankin, Marco Donato, Mark Hempstead, Gu-Yeon Wei, David Brooks 0001. 938-956 [doi]
- Stay in your Lane: A NoC with Low-overhead Multi-packet BypassingHossein Farrokhbakht, Paul V. Gratz, Tushar Krishna, Joshua San Miguel, Natalie D. Enright Jerger. 957-970 [doi]
- FastTrackNoC: A NoC with FastTrack Router DatapathsAhsen Ejaz, Ioannis Sourdis. 971-985 [doi]
- Upward Packet Popup for Deadlock Freedom in Modular Chiplet-Based SystemsYibo Wu, Liang Wang 0020, Xiaohang Wang, Jie Han 0001, Jianfeng Zhu 0001, Honglan Jiang, Shouyi Yin, Shaojun Wei, Leibo Liu. 986-1000 [doi]
- Saving PAM4 Bus Energy with SMOREs: Sparse Multi-level Opportunistic Restricted EncodingsMike O'Connor, Donghyuk Lee, Niladrish Chatterjee, Michael B. Sullivan 0001, Stephen W. Keckler. 1001-1013 [doi]
- Delegated Replies: Alleviating Network Clogging in Heterogeneous ArchitecturesXia Zhao 0004, Lieven Eeckhout, Magnus Jahre. 1014-1028 [doi]
- Accelerating Graph Convolutional Networks Using Crossbar-based Processing-In-Memory ArchitecturesYu Huang 0013, Long Zheng 0003, Pengcheng Yao, Qinggang Wang, Xiaofei Liao, Hai Jin 0001, Jingling Xue. 1029-1042 [doi]
- Enabling High-Quality Uncertainty Quantification in a PIM Designed for Bayesian Neural NetworkXingchen Li, Bingzhe Wu, Guangyu Sun 0003, Zhe Zhang, Zhihang Yuan, Runsheng Wang, Ru Huang, Dimin Niu, Hongzhong Zheng, Zhichao Lu, Liang Zhao, Meng-Fan Marvin Chang, Tianchan Guan, Xin Si. 1043-1055 [doi]
- RM-SSD: In-Storage Computing for Large-Scale Recommendation InferenceXuan Sun 0003, Hu Wan 0001, Qiao Li 0001, Chia-Lin Yang, Tei-Wei Kuo, Chun Jason Xue. 1056-1070 [doi]
- TransPIM: A Memory-based Acceleration via Software-Hardware Co-Design for TransformerMinxuan Zhou, Weihong Xu, Jaeyoung Kang 0001, Tajana Rosing. 1071-1085 [doi]
- PIMCloud: QoS-Aware Resource Management of Latency-Critical Applications in Clouds with Processing-in-MemoryShuang Chen 0002, Yi Jiang, Christina Delimitrou, José F. Martínez. 1086-1099 [doi]
- Exploiting Inter-block Entropy to Enhance the Compressibility of Blocks with Diverse DataJinkwon Kim, Mincheol Kang, Jeongkyu Hong, Soontae Kim. 1100-1114 [doi]
- GBDI: Going Beyond Base-Delta-Immediate Compression with Global BasesAlexandra Angerd, Angelos Arelakis, Vasilis Spiliopoulos, Erik Sintorn, Per Stenström. 1115-1127 [doi]
- Virtual Coset Coding for Encrypted Non-Volatile Memories with Multi-Level CellsStephen Longofono, Seyed Mohammad Seyedzadeh, Alex K. Jones. 1128-1140 [doi]
- DR-STRaNGe: End-to-End System Design for DRAM-based True Random Number GeneratorsF. Nisa Bostanci, Ataberk Olgun, Lois Orosa 0001, Abdullah Giray Yaglikçi, Jeremie S. Kim, Hasan Hassan, Oguz Ergin, Onur Mutlu. 1141-1155 [doi]
- Mithril: Cooperative Row Hammer Protection on Commodity DRAM Leveraging Managed RefreshMichael Jaemin Kim, Jaehyun Park, Yeonhong Park, Wanju Doh, Namhoon Kim, Tae Jun Ham, Jae W. Lee, Jung Ho Ahn. 1156-1169 [doi]
- DarkGates: A Hybrid Power-Gating Architecture to Mitigate the Performance Impact of Dark-Silicon in High Performance ProcessorsJawad Haj-Yahya, Jeremie S. Kim, Abdullah Giray Yaglikçi, Jisung Park, Efraim Rotem, Yanos Sazeides, Onur Mutlu. 1170-1183 [doi]
- GPU Subwarp InterleavingSana Damani, Mark Stephenson, Ram Rangan, Daniel R. Johnson, Rishkul Kulkami, Stephen W. Keckler. 1184-1197 [doi]
- Application Defined On-chip Networks for Heterogeneous Chiplets: An Implementation PerspectiveTianqi Wang, Fan Feng, Shaolin Xiang, Qi Li, Jing Xia. 1198-1210 [doi]
- The Specialized High-Performance Network on Anton 3Keun Sup Shim, Brian Greskamp, Brian Towles, Bruce Edwards, J. P. Grossman, David E. Shaw. 1211-1223 [doi]
- AI-Enabling Workloads on Large-Scale GPU-Accelerated System: Characterization, Opportunities, and ImplicationsBaolin Li, Rohin Arora, Siddharth Samsi, Tirthak Patel, William Arcand, David Bestor, Chansup Byun, Rohan Basu Roy, Bill Bergeron, John Holodnak, Michael Houle, Matthew Hubbell, Michael Jones 0001, Jeremy Kepner, Anna Klein, Peter Michaleas, Joseph McDonald, Lauren Milechin, Julie Mullen, Andrew Prout, Benjamin Price, Albert Reuther, Antonio Rosa, Matthew L. Weiss, Charles Yee, Daniel Edelman, Allan Vanterpool, Anson Cheng, Vijay Gadepally, Devesh Tiwari. 1224-1237 [doi]