Abstract is missing.
- Wire-Aware Architecture and Dataflow for CNN AcceleratorsSumanth Gudaparthi, Surya Narayanan, Rajeev Balasubramonian, Edouard Giacomin, Hari Kambalasubramanyam, Pierre-Emmanuel Gaillardon. 1-13 [doi]
- Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based ArchitectureYakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Ross Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel S. Emer, C. Thomas Gray, Brucek Khailany, Stephen W. Keckler. 14-27 [doi]
- ShapeShifter: Enabling Fine-Grain Data Width Adaptation in Deep LearningAlberto Delmas Lascorz, Sayeh Sharify, Isak Edo Vivancos, Dylan Malone Stuart, Omar Mohamed Awad, Patrick Judd, Mostafa Mahmoud, Milos Nikolic, Kevin Siu, Zissis Poulos, Andreas Moshovos. 28-41 [doi]
- MI6: Secure Enclaves in a Speculative Out-of-Order ProcessorThomas Bourgeat, Ilia A. Lebedev, Andrew Wright, Sizhuo Zhang, Arvind, Srinivas Devadas. 42-56 [doi]
- Cyclone: Detecting Contention-Based Cache Information Leaks Through Cyclic InterferenceAustin Harris, Shijia Wei, Prateek Sahu, Pranav Kumar, Todd M. Austin, Mohit Tiwari. 57-72 [doi]
- CleanupSpec: An "Undo" Approach to Safe SpeculationGururaj Saileshwar, Moinuddin K. Qureshi. 73-86 [doi]
- eAP: A Scalable and Efficient In-Memory Accelerator for Automata ProcessingElaheh Sadredini, Reza Rahimi, Vaibhav Verma, Mircea Stan, Kevin Skadron. 87-99 [doi]
- ComputeDRAM: In-Memory Compute Using Off-the-Shelf DRAMsFei Gao, Georgios Tziantzioulis, David Wentzlaff. 100-113 [doi]
- CASCADE: Connecting RRAMs to Extend Analog Dataflow In An End-To-End In-Memory Processing ParadigmTeyuh Chou, Wei Tang, Jacob Botimer, Zhengya Zhang. 114-125 [doi]
- ZCOMP: Reducing DNN Cross-Layer Memory Footprint Using Vector ExtensionsBerkin Akin, Zeshan A. Chishti, Alaa R. Alameldeen. 126-138 [doi]
- Boosting the Performance of CNN Accelerators with Dynamic Fine-Grained Channel GatingWeizhe Hua, Yuan Zhou, Christopher De Sa, Zhiru Zhang, G. Edward Suh. 139-150 [doi]
- SparTen: A Sparse Tensor Accelerator for Convolutional Neural NetworksAshish Gondimalla, Noah Chesnut, Mithuna Thottethodi, T. N. Vijaykumar. 151-165 [doi]
- EDEN: Enabling Energy-Efficient, High-Performance Deep Neural Network Inference Using Approximate DRAMSkanda Koppula, Lois Orosa, Abdullah Giray Yaglikçi, Roknoddin Azizi, Taha Shahroodi, Konstantinos Kanellopoulos, Onur Mutlu. 166-181 [doi]
- eCNN: A Block-Based and Highly-Parallel CNN Accelerator for Edge InferenceChao-Tsung Huang, Yu-Chun Ding, Huan-Ching Wang, Chi-Wen Weng, Kai-Ping Lin, Li-Wei Wang, Li-De Chen. 182-195 [doi]
- Dynamic Multi-Resolution Data StorageYu-Ching Hu, Murtuza Taher Lokhandwala, Te I, Hung-Wei Tseng. 196-210 [doi]
- Exploiting Process Similarity of 3D Flash Memory for High Performance SSDsYoungseop Shim, Myungsuk Kim, Myoungjun Chun, Jisung Park, Yoona Kim, Jihong Kim. 211-223 [doi]
- DeepStore: In-Storage Acceleration for Intelligent QueriesVikram Sharma Mailthody, Zaid Qureshi, Weixin Liang, Ziyan Feng, Simon Garcia De Gonzalo, Youjie Li, Hubertus Franke, Jinjun Xiong, Jian Huang 0006, Wen-mei Hwu. 224-238 [doi]
- FIDR: A Scalable Storage System for Fine-Grain Inline Data Reduction with Efficient Memory HandlingMohammadamin Ajdari, Wonsik Lee, Pyeongsu Park, Joonsung Kim, Jangwoo Kim. 239-252 [doi]
- Ensemble of Diverse Mappings: Improving Reliability of Quantum Computers by Orchestrating Dissimilar MistakesSwamit S. Tannu, Moinuddin K. Qureshi. 253-265 [doi]
- Partial Compilation of Variational Algorithms for Noisy Intermediate-Scale Quantum MachinesPranav Gokhale, Yongshan Ding, Thomas Propson, Christopher Winkler, Nelson Leung, Yunong Shi, David I. Schuster, Henry Hoffmann, Frederic T. Chong. 266-278 [doi]
- Mitigating Measurement Errors in Quantum Computers by Exploiting State-Dependent BiasSwamit S. Tannu, Moinuddin K. Qureshi. 279-290 [doi]
- A Case for Multi-Programming Quantum ComputersPoulami Das 0005, Swamit S. Tannu, Prashant J. Nair, Moinuddin K. Qureshi. 291-303 [doi]
- FlexLearn: Fast and Highly Efficient Brain Simulations Using Flexible On-Chip LearningEunjin Baek, Hunjun Lee, Youngsok Kim, Jangwoo Kim. 304-318 [doi]
- ExTensor: An Accelerator for Sparse Tensor AlgebraKartik Hegde, Hadi Asghari Moghaddam, Michael Pellauer, Neal Clayton Crago, Aamer Jaleel, Edgar Solomonik, Joel S. Emer, Christopher W. Fletcher. 319-333 [doi]
- GenCache: Leveraging In-Cache Operators for Efficient Sequence AlignmentAnirban Nag, C. N. Ramachandra, Rajeev Balasubramonian, Ryan Stutsman, Edouard Giacomin, Hari Kambalasubramanyam, Pierre-Emmanuel Gaillardon. 334-346 [doi]
- Efficient SpMV Operation for Large and Highly Sparse Matrices using Scalable Multi-way Merge ParallelizationFazle Sadi, Joe Sweeney, Tze Meng Low, James C. Hoe, Larry T. Pileggi, Franz Franchetti. 347-358 [doi]
- Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-wise Sparse Neural Networks on Modern GPUsMaohua Zhu, Tao Zhang, Zhenyu Gu, Yuan Xie 0001. 359-371 [doi]
- NVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUsOreste Villa, Mark Stephenson, David W. Nellans, Stephen W. Keckler. 372-383 [doi]
- Tangram: Integrated Control of Heterogeneous ComputersRaghavendra Pradyumna Pothukuchi, Joseph L. Greathouse, Karthik Rao, Christopher Erb, Leonardo Piga, Petros G. Voulgaris, Josep Torrellas. 384-398 [doi]
- CoSpec: Compiler Directed Speculative Intermittent ComputationJongouk Choi, Qingrui Liu, Changhee Jung. 399-412 [doi]
- Applying Deep Learning to the Cache Replacement ProblemZhan Shi, Xiangru Huang, Akanksha Jain, Calvin Lin. 413-425 [doi]
- DynaSprint: Microarchitectural Sprints with Dynamic Utility and Thermal ManagementZiqiang Huang, José A. Joao, Alejandro Rico, Andrew D. Hilton, Benjamin C. Lee. 426-439 [doi]
- Leveraging Caches to Accelerate Hash Tables and MemoizationGuowei Zhang, Daniel Sánchez 0003. 440-452 [doi]
- Touché: Towards Ideal and Efficient Cache Compression By Mitigating Tag Area OverheadsSeokin Hong, Bülent Abali, Alper Buyuktosunoglu, Michael B. Healy, Prashant J. Nair. 453-465 [doi]
- Distributed Logless Atomic Durability with Persistent MemorySiddharth Gupta, Alexandros Daglis, Babak Falsafi. 466-478 [doi]
- SuperMem: Enabling Application-transparent Secure Persistent Memory with Low OverheadsPengfei Zuo, Yu Hua 0001, Yuan Xie. 479-492 [doi]
- Constructing Large, Durable and Fast SSD System via Reprogramming 3D TLC Flash MemoryCongming Gao, Min Ye, Qiao Li, Chun Jason Xue, Youtao Zhang, Liang Shi, Jun Yang. 493-505 [doi]
- SWQUE: A Mode Switching Issue Queue with Priority-Correcting Circular QueueHideki Ando. 506-518 [doi]
- Towards the adoption of Local Branch Predictors in Modern Out-of-Order Superscalar ProcessorsNiranjan Soundararajan, Saurabh Gupta, Ragavendra Natarajan, Jared Stark, Rahul Pal, Franck Sala, Lihu Rappoport, Adi Yoaz, Sreenivas Subramoney. 519-530 [doi]
- DSPatch: Dual Spatial Pattern PrefetcherRahul Bera, Anant V. Nori, Onur Mutlu, Sreenivas Subramoney. 531-544 [doi]
- CHERIvoke: Characterising Pointer Revocation using CHERI Capabilities for Temporal Memory SafetyHongyan Xia, Jonathan Woodruff, Sam Ainsworth, Nathaniel Wesley Filardo, Michael Roe, Alexander Richardson, Peter Rugg, Peter G. Neumann, Simon W. Moore, Robert N. M. Watson, Timothy M. Jones 0001. 545-557 [doi]
- Practical Byte-Granular Memory Blacklisting using CaliformsHiroshi Sasaki, Miguel A. Arroyo, M. Tarek Ibn Ziad, Koustubha Bhat, Kanad Sinha, Simha Sethumadhavan. 558-571 [doi]
- NDA: Preventing Speculative Execution Attacks at Their SourceOfir Weisse, Ian Neal, Kevin Loughlin, Thomas F. Wenisch, Baris Kasikci. 572-586 [doi]
- MEDAL: Scalable DIMM based Near Data Processing Accelerator for DNA Seeding AlgorithmWenqin Huangfu, Xueqi Li, Shuangchen Li, Xing Hu 0001, Peng Gu, Yuan Xie 0001. 587-599 [doi]
- SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix OperationsKonstantinos Kanellopoulos, Nandita Vijaykumar, Christina Giannoula, Roknoddin Azizi, Skanda Koppula, Nika Mansouri-Ghiasi, Taha Shahroodi, Juan Gómez-Luna, Onur Mutlu. 600-614 [doi]
- Alleviating Irregularity in Graph Analytics Acceleration: a Hardware/Software Co-Design ApproachMingyu Yan, Xing Hu, Shuangchen Li, Abanti Basak, Han Li, Xin Ma, Itir Akgun, Yujing Feng, Peng Gu, Lei Deng 0003, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, Yuan Xie 0001. 615-628 [doi]
- Tigris: Architecture and Algorithms for 3D Perception in Point CloudsTiancheng Xu, Boyuan Tian, Yuhao Zhu 0001. 629-642 [doi]
- ASV: Accelerated Stereo Vision SystemYu Feng, Paul N. Whatmough, Yuhao Zhu 0001. 643-656 [doi]
- Distilling the Essence of Raw Video to Reduce Memory Usage and Energy at Edge DevicesHaibo Zhang, Shulin Zhao, Ashutosh Pattnaik, Mahmut T. Kandemir, Anand Sivasubramaniam, Chita R. Das. 657-669 [doi]
- MANIC: A Vector-Dataflow Architecture for Ultra-Low-Power Embedded SystemsGraham Gobieski, Amolak Nagi, Nathan Serafin, Mehmet Meric Isgenc, Nathan Beckmann, Brandon Lucia. 670-684 [doi]
- SOSA: Self-Optimizing Learning with Self-Adaptive Control for Hierarchical System-on-Chip ManagementBryan Donyanavard, Tiago Mück, Amir M. Rahmani, Nikil D. Dutt, Armin Sadighi, Florian Maurer, Andreas Herkersdorf. 685-698 [doi]
- NetDIMM: Low-Latency Near-Memory Network Interface ArchitectureMohammad Alian, Nam Sung Kim. 699-711 [doi]
- GraphQ: Scalable PIM-Based Graph ProcessingYouwei Zhuo, Chao Wang 0001, Mingxing Zhang, Rui Wang 0014, Dimin Niu, Yanzhi Wang, Xuehai Qian. 712-725 [doi]
- Charon: Specialized Near-Memory Processing Architecture for Clearing Dead Objects in MemoryJaeyoung Jang, Jun Heo, Yejin Lee, Jaeyeon Won, Seonghak Kim, Sungjun Jung, Hakbeom Jang, Tae Jun Ham, Jae W. Lee. 726-739 [doi]
- TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep LearningYoungeun Kwon, Yunjae Lee, Minsoo Rhu. 740-753 [doi]
- Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric ApproachHyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, Tushar Krishna. 754-768 [doi]
- MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error MitigationLillian Pentecost, Marco Donato, Brandon Reagen, Udit Gupta, Siming Ma, Gu-Yeon Wei, David Brooks 0001. 769-781 [doi]
- Neuron-Level Fuzzy Memoization in RNNsFranyell Silfa, Gem Dot, Jose-Maria Arnau, Antonio González 0001. 782-793 [doi]
- Manna: An Accelerator for Memory-Augmented Neural NetworksJacob R. Stevens, Ashish Ranjan, Dipankar Das 0002, Bharat Kaul, Anand Raghunathan. 794-806 [doi]
- Binary Star: Coordinated Reliability in Heterogeneous Memory Systems for High Performance and ScalabilityXiao Liu, David Roberts, Rachata Ausavarungnirun, Onur Mutlu, Jishen Zhao. 807-820 [doi]
- Quantifying Memory Underutilization in HPC Systems and Using it to Improve Performance via Architecture SupportGagandeep Panwar, Da Zhang, Yihan Pang, Mai Dahshan, Nathan DeBardeleben, Binoy Ravindran, Xun Jian. 821-835 [doi]
- SSP: Eliminating Redundant Writes in Failure-Atomic NVRAMs via Shadow Sub-PagingYuanjiang Ni, Jishen Zhao, Heiner Litz, Daniel Bittman, Ethan L. Miller. 836-848 [doi]
- Towards Efficient NVDIMM-based Heterogeneous Storage Hierarchy Management for Big Data WorkloadsRenhai Chen, Zili Shao, Duo Liu, Zhiyong Feng, Tao Li. 849-860 [doi]
- Adding Tightly-Integrated Task Scheduling Acceleration to a RISC-V Multi-core ProcessorLucas Morais, Vitor Silva, Alfredo Goldman, Carlos Álvarez 0001, Jaume Bosch, Michael Frank, Guido Araujo. 861-872 [doi]
- SWAP: Synchronized Weaving of Adjacent Packets for Network Deadlock ResolutionMayank Parasar, Natalie D. Enright Jerger, Paul V. Gratz, Joshua San Miguel, Tushar Krishna. 873-885 [doi]
- PUSh: Data Race Detection Based on Hardware-Supported Prevention of Unintended SharingDiyu Zhou, Yuval Tamir. 886-898 [doi]
- EMI Architectural Model and Core HoppingDaphne I. Gorman, Rafael Trapani Possignolo, Jose Renau. 899-910 [doi]
- FPGA-Accelerated Optimistic Concurrency Control for Transactional MemoryZhaoshi Li, Leibo Liu, Yangdong Deng, Jiawei Wang, Zhiwei Liu, Shouyi Yin, Shaojun Wei. 911-923 [doi]
- Towards General Purpose Acceleration by Exploiting Common Data-Dependence FormsVidushi Dadu, Jian Weng, Sihao Liu, Tony Nowatzki. 924-939 [doi]
- μIR -An intermediate representation for transforming and optimizing the microarchitecture of application acceleratorsAmirali Sharifian, Reza Hojabr, Navid Rahimi, Sihao Liu, Apala Guha, Tony Nowatzki, Arrvindh Shriraman. 940-953 [doi]
- Speculative Taint Tracking (STT): A Comprehensive Protection for Speculatively Accessed DataJiyong Yu, Mengjia Yan, Artem Khyzha, Adam Morrison 0001, Josep Torrellas, Christopher W. Fletcher. 954-968 [doi]
- LATCH: A Locality-Aware Taint CHeckerDaniel Townley, Khaled N. Khasawneh, Dmitry Ponomarev, Nael B. Abu-Ghazaleh, Lei Yu 0001. 969-982 [doi]
- EMMA: Hardware/Software Attestation Framework for Embedded Systems Using Electromagnetic SignalsNader Sehatbakhsh, Alireza Nazari, Haider A. Khan, Alenka G. Zajic, Milos Prvulovic. 983-995 [doi]
- Temporal Prefetching Without the Off-Chip MetadataHao Wu, Krishnendra Nathella, Joseph Pusdesris, Dam Sunwoo, Akanksha Jain, Calvin Lin. 996-1008 [doi]
- PHI: Architectural Support for Synchronization- and Bandwidth-Efficient Commutative Scatter UpdatesAnurag Mukkara, Nathan Beckmann, Daniel Sánchez 0003. 1009-1022 [doi]
- Prefetched Address TranslationArtemiy Margaritov, Dmitrii Ustiugov, Edouard Bugnion, Boris Grot. 1023-1036 [doi]
- Directed Statistical Warming through Time TravelingNikos Nikoleris, Lieven Eeckhout, Erik Hagersten, Trevor E. Carlson. 1037-1049 [doi]
- Simmani: Runtime Power Modeling for Arbitrary RTL with Automatic Signal SelectionDonggyu Kim, Jerry Zhao, Jonathan Bachrach, Krste Asanovic. 1050-1062 [doi]
- Architectural Implications of Function-as-a-Service ComputingMohammad Shahrad, Jonathan Balkind, David Wentzlaff. 1063-1075 [doi]
- InvisiSpec: Making Speculative Execution Invisible in the Cache Hierarchy (Corrigendum)Mengjia Yan, Jiho Choi, Dimitrios Skarlatos, Adam Morrison 0001, Christopher W. Fletcher, Josep Torrellas. 1076 [doi]