Abstract is missing.
- Deep Learning Acceleration with Neuron-to-Memory TransformationMohsen Imani, Mohammad Samragh Razlighi, Yeseong Kim, Saransh Gupta, Farinaz Koushanfar, Tajana Rosing. 1-14 [doi]
- HyGCN: A GCN Accelerator with Hybrid ArchitectureMingyu Yan, Lei Deng 0003, Xing Hu, Ling Liang, Yujing Feng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, Yuan Xie 0001. 15-29 [doi]
- ACR: Amnesic Checkpointing and RecoveryIsmail Akturk, Ulya R. Karpuzcu. 30-43 [doi]
- Asymmetric Resilience: Exploiting Task-Level Idempotency for Transient Error Recovery in Accelerator-Based SystemsJingwen Leng, Alper Buyuktosunoglu, Ramon Bertran, Pradip Bose, Quan Chen 0002, Minyi Guo, Vijay Janapa Reddi. 44-57 [doi]
- SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN TrainingEric Qin, Ananda Samajdar, Hyoukjun Kwon, Vineet Nadella, Sudarshan Srinivasan, Dipankar Das 0002, Bharat Kaul, Tushar Krishna. 58-70 [doi]
- EMSim: A Microarchitecture-Level Simulation Tool for Modeling Electromagnetic Side-Channel SignalsNader Sehatbakhsh, Baki Berkay Yilmaz, Alenka G. Zajic, Milos Prvulovic. 71-85 [doi]
- Impala: Algorithm/Architecture Co-Design for In-Memory Multi-Stride Pattern MatchingElaheh Sadredini, Reza Rahimi, Marzieh Lenjani, Mircea Stan, Kevin Skadron. 86-98 [doi]
- A Deep Reinforcement Learning Framework for Architectural Exploration: A Routerless NoC Case StudyTing-Ru Lin, Drew Penney, Massoud Pedram, Lizhong Chen. 99-110 [doi]
- IRONHIDE: A Secure Multicore that Efficiently Mitigates Microarchitecture State Attacks for Interactive ApplicationsHamza Omar, Omer Khan. 111-122 [doi]
- A New Side-Channel Vulnerability on Modern Computers by Exploiting Electromagnetic Emanations from the Power Management UnitNader Sehatbakhsh, Baki Berkay Yilmaz, Alenka G. Zajic, Milos Prvulovic. 123-138 [doi]
- Leaking Information Through Cache LRU StatesWenjie Xiong 0001, Jakub Szefer. 139-152 [doi]
- Baldur: A Power-Efficient and Scalable Network Using All-Optical SwitchesMohammad Reza Jokar, Junyi Qiu, Frederic T. Chong, Lynford L. Goddard, John M. Dallesasse, Milton Feng, Yanjing Li. 153-166 [doi]
- Twig: Multi-Agent Task Management for Colocated Latency-Critical Cloud ServicesRajiv Nishtala, Vinicius Petrucci, Paul Carpenter, Magnus Själander. 167-179 [doi]
- QuickNN: Memory and Performance Optimization of k-d Tree Based Nearest Neighbor Search for 3D Point CloudsReid Pinkham, Shuqing Zeng, Zhengya Zhang. 180-192 [doi]
- CLITE: Efficient and QoS-Aware Co-Location of Multiple Latency-Critical Jobs for Warehouse Scale ComputersTirthak Patel, Devesh Tiwari. 193-206 [doi]
- Q-Zilla: A Scheduling Framework and Core Microarchitecture for Tail-Tolerant MicroservicesAmirhossein Mirhosseini, Brendan L. West, Geoffrey W. Blake, Thomas F. Wenisch. 207-219 [doi]
- PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing UnitsYujeong Choi, Minsoo Rhu. 220-233 [doi]
- Domain-Specialized Cache Management for Graph AnalyticsPriyank Faldu, Jeff Diamond, Boris Grot. 234-248 [doi]
- ALRESCHA: A Lightweight Reconfigurable Sparse-Computation AcceleratorBahar Asgari, Ramyad Hadidi, Tushar Krishna, Hyesoon Kim, Sudhakar Yalamanchili. 249-260 [doi]
- SpArch: Efficient Architecture for Sparse Matrix MultiplicationZhekai Zhang, Hanrui Wang 0002, Song Han, William J. Dally. 261-274 [doi]
- Mitigating Voltage Drop in Resistive Memories by Dynamic RESET Voltage Regulation and Partition RESETFarzaneh Zokaee, Lei Jiang 0001. 275-286 [doi]
- DRAM-Less: Hardware Acceleration of Data Processing with New MemoryJie Zhang 0048, Gyuyoung Park, David Donofrio, John Shalf, Myoungsoo Jung. 287-302 [doi]
- ELP2IM: Efficient and Low Power Bitwise Operation Processing in DRAMXin Xin, Youtao Zhang, Jun Yang. 303-314 [doi]
- ResiRCA: A Resilient Energy Harvesting ReRAM Crossbar-Based Accelerator for Intelligent Embedded ProcessorsKeni Qiu, Nicholas Jao, Mengying Zhao, Cyan Subhra Mishra, Gulsum Gudukbay, Sethu Jose, Jack Sampson, Mahmut Taylan Kandemir, Vijaykrishnan Narayanan. 315-327 [doi]
- A^3: Accelerating Attention Mechanisms in Neural Networks with ApproximationTae Jun Ham, Sungjun Jung, Seonghak Kim, Young H. Oh, Yeonhong Park, Yoonho Song, Jung Hun Park, SangHee Lee, Kyoung Park, Jae W. Lee, Deog Kyoon Jeong. 328-341 [doi]
- AccPar: Tensor Partitioning for Heterogeneous Deep Learning AcceleratorsLinghao Song, Fan Chen, Youwei Zhuo, Xuehai Qian, Hai Li, Yiran Chen. 342-355 [doi]
- FLOWER and FaME: A Low Overhead Bit-Level Fault-map and Fault-Tolerance Approach for Deeply Scaled MemoriesDonald Kline Jr., Jiangwei Zhang, Rami G. Melhem, Alex K. Jones. 356-368 [doi]
- Multi-Range Supported Oblivious RAM for Efficient Block Data RetrievalYuezhi Che, Rujia Wang. 369-382 [doi]
- CASINO Core Microarchitecture: Generating Out-of-Order Schedules Using Cascaded In-Order Scheduling WindowsIpoom Jeong, Seihoon Park, Changmin Lee, Won Woo Ro. 383-396 [doi]
- Precise Runahead ExecutionAjeya Naithani, Josué Feliu, Almutaz Adileh, Lieven Eeckhout. 397-410 [doi]
- BBS: Micro-Architecture Benchmarking Blockchain Systems through Machine Learning and Fuzzy SetLiang Zhu, Chao Chen, Zihao Su, Weiguang Chen, Tao Li, Zhibin Yu. 411-423 [doi]
- Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order ProcessorsMehdi Alipour, Stefanos Kaxiras, David Black-Schaffer, Rakesh Kumar 0003. 424-434 [doi]
- EquiNox: Equivalent NoC Injection Routers for Silicon Interposer-Based Throughput ProcessorsYunfan Li, Lizhong Chen. 435-446 [doi]
- DRAIN: Deadlock Removal for Arbitrary Irregular NetworksMayank Parasar, Hossein Farrokhbakht, Natalie D. Enright Jerger, Paul V. Gratz, Tushar Krishna, Joshua San Miguel. 447-460 [doi]
- SnackNoC: Processing in the Communication LayerKarthik Sangaiah, Michael Lui, Ragh Kuttappa, Baris Taskin, Mark Hempstead. 461-473 [doi]
- PIXEL: Photonic Neural Network AcceleratorKyle Shiflett, Dylan Wright, Avinash Karanth, Ahmed Louri. 474-487 [doi]
- The Architectural Implications of Facebook's DNN-Based Personalized RecommendationUdit Gupta, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks 0001, Bradford Cottel, Kim M. Hazelwood, Mark Hempstead, Bill Jia, Hsien-Hsin S. Lee, Andrey Malevich, Dheevatsa Mudigere, Mikhail Smelyanskiy, Liang Xiong, Xuan Zhang 0001. 488-501 [doi]
- NVDIMM-C: A Byte-Addressable Non-Volatile Memory Module for Compatibility with Standard DDR Memory InterfacesChangmin Lee, Wonjae Shin, Dae Jeong Kim, Yongjun Yu, Sung-Joon Kim, Taekyeong Ko, Deokho Seo, Jongmin Park, KwangHee Lee, Seongho Choi, Namhyung Kim, Vishak G, Arun George, Vishwas V, Donghun Lee, Kang-Woo Choi, Changbin Song, Dohan Kim, Insu Choi, Ilgyu Jung, Yong Ho Song, Jinman Han. 502-514 [doi]
- Missing the Forest for the Trees: End-to-End AI Application Performance in Edge Data CentersDaniel Richins, Dharmisha Doshi, Matthew Blackmore, Aswathy Thulaseedharan Nair, Neha Pathapati, Ankit Patel, Brainard Daguman, Daniel Dobrijalowski, Ramesh Illikkal, Kevin Long, David Zimmerman, Vijay Janapa Reddi. 515-528 [doi]
- Communication Lower Bound in Convolution AcceleratorsXiaoming Chen 0003, Yinhe Han, Yu Wang. 529-541 [doi]
- Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture DesignXingyao Zhang, Shuaiwen Leon Song, Chenhao Xie, Jing Wang 0055, Weigong Zhang, Xin Fu. 542-555 [doi]
- Fulcrum: A Simplified Control and Access Mechanism Toward Flexible and Practical In-Situ AcceleratorsMarzieh Lenjani, Patricia Gonzalez, Elaheh Sadredini, Shuangchen Li, Yuan Xie, Ameen Akel, Sean Eilert, Mircea R. Stan, Kevin Skadron. 556-569 [doi]
- BCoal: Bucketing-Based Memory Coalescing for Efficient and Secure GPUsGurunath Kadam, Danfeng Zhang, Adwait Jog. 570-581 [doi]
- HMG: Extending Cache Coherence Protocols Across Modern Hierarchical Multi-GPU SystemsXiaowei Ren, Daniel Lustig, Evgeny Bolotin, Aamer Jaleel, Oreste Villa, David W. Nellans. 582-595 [doi]
- Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU SystemsTrinayan Baruah, Yifan Sun, Ali Tolga Dinçer, Saiful A. Mojumder, José L. Abellán, Yash Ukidave, Ajay Joshi, Norman Rubin, John Kim, David R. Kaeli. 596-609 [doi]
- EFLOPS: Algorithm and System Co-Design for a High Performance Distributed Training PlatformJianbo Dong, Zheng Cao, Tao Zhang, Jianxi Ye, Shaochuang Wang, Fei Feng, Li Zhao, Xiaoyong Liu, Liuyihan Song, Liwei Peng, Yiqun Guo, Xiaowei Jiang, Lingbo Tang, Yin Du, Yingya Zhang, Pan Pan, Yuan Xie. 610-622 [doi]
- Techniques for Reducing the Connected-Standby Energy Consumption of Mobile DevicesJawad Haj-Yahya, Yanos Sazeides, Mohammed Alser, Efraim Rotem, Onur Mutlu. 623-636 [doi]
- Experiences with ML-Driven Design: A NoC Case StudyJieming Yin, Subhash Sethumurugan, Yasuko Eckert, Chintan Patel, Alan Smith, Eric Morton, Mark Oskin, Natalie D. Enright Jerger, Gabriel H. Loh. 637-648 [doi]
- Hybrid2: Combining Caching and Migration in Hybrid Memory SystemsEvangelos Vasilakis, Vassilis Papaefstathiou, Pedro Trancoso, Ioannis Sourdis. 649-662 [doi]
- Charge-Aware DRAM Refresh Reduction with Value TransformationSeikwon Kim, Wonsang Kwak, Changdae Kim, Daehyeon Baek, Jaehyuk Huh. 663-676 [doi]
- DWT: Decoupled Workload Tracing for Data CentersJian Chen, Ying Zhang, Xiaowei Jiang, Li Zhao, Zheng Cao, Qiang Liu. 677-688 [doi]
- Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor ComputationsNitish Kumar Srivastava, Hanchen Jin, Shaden Smith, Hongbo Rong, David H. Albonesi, Zhiru Zhang. 689-702 [doi]
- A Hybrid Systolic-Dataflow Architecture for Inductive Matrix AlgorithmsJian Weng, Sihao Liu, Zhengrong Wang, Vidushi Dadu, Tony Nowatzki. 703-716 [doi]
- Improving Predication Efficiency through Compaction/Restoration of SIMD InstructionsAdrián Barredo, Juan M. Cebrian, Miquel Moretó, Marc Casas, Mateo Valero. 717-728 [doi]