Abstract is missing.
- Data Compression Accelerator on IBM POWER9 and z15 Processors : Industrial ProductBülent Abali, Bart Blaner, John J. Reilly, Matthias Klein, Ashutosh Mishra, Craig B. Agricola, Bedri Sendir, Alper Buyuktosunoglu, Christian Jacobi, William J. Starke, Haren Myneni, Charlie Wang. 1-14 [doi]
- High-Performance Deep-Learning Coprocessor Integrated into x86 SoC with Server-Class CPUs Industrial ProductGlenn Henry, Parviz Palangpour, Michael Thomson, J. Scott Gardner, Bryce Arden, Jim Donahue, Kimble Houck, Jonathan Johnson, Kyle O'Brien, Scott Petersen, Benjamin Seroussi, Tyler Walker. 15-26 [doi]
- The IBM z15 High Frequency Mainframe Branch Predictor Industrial ProductNarasimha Adiga, James Bonanno, Adam Collura, Matthias Heizmann, Brian R. Prasky, Anthony Saporito. 27-39 [doi]
- Evolution of the Samsung Exynos CPU MicroarchitectureBrian Grayson, Jeff Rupley, Gerald D. Zuraski, Eric Quinnell, Daniel A. Jiménez, Tarun Nakra, Paul Kitchin, Ryan Hensley, Edward Brekelbaum, Vikas Sinha, Ankit Ghiya. 40-51 [doi]
- Xuantie-910: A Commercial Multi-Core 12-Stage Pipeline Out-of-Order 64-bit High Performance RISC-V Processor with Vector Extension : Industrial ProductChen Chen, Xiaoyan Xiang, Chang Liu, Yunhai Shang, Ren Guo, Dongqi Liu, Yimin Lu, Ziyi Hao, Jiahui Luo, Zhijian Chen, Chunqiang Li, Yu Pu, Jianyi Meng, Xiaolang Yan, Yuan Xie, Xiaoning Qi. 52-64 [doi]
- Divide and Conquer Frontend BottleneckAli Ansari, Pejman Lotfi-Kamran, Hamid Sarbazi-Azad. 65-78 [doi]
- * : Concepts, techniques and implementations presented in this paper are subject matter of pending patent applications, which have been filed by Intel CorporationSumeet Bandishte, Jayesh Gaur, Zeev Sperber, Lihu Rappoport, Adi Yoaz, Sreenivas Subramoney. 79-91 [doi]
- Auto-Predication of Critical BranchesAdarsh Chauhan, Jayesh Gaur, Zeev Sperber, Franck Sala, Lihu Rappoport, Adi Yoaz, Sreenivas Subramoney. 92-104 [doi]
- Slipstream Processors Revisited: Exploiting Branch SetsVinesh Srinivasan, Rangeen Basu Roy Chowdhury, Eric Rotenberg. 105-117 [doi]
- Bouquet of Instruction Pointers: Instruction Pointer Classifier-based Spatial Hardware PrefetchingSamuel Pakalapati, Biswabandan Panda. 118-131 [doi]
- MuonTrap: Preventing Cross-Domain Spectre-Like Attacks by Capturing Speculative StateSam Ainsworth, Timothy M. Jones 0001. 132-144 [doi]
- Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning WorkloadsDennis Abts, Jonathan Ross, Jonathan Sparling, Mark Wong-VanHaren, Max Baker, Tom Hawkins, Andrew Bell, John Thompson, Temesghen Kahsai, Garrin Kimmell, Jennifer Hwang, Rebekah Leslie-Hurd, Michael Bye, E. R. Creswick, Matthew Boyd, Mahitha Venigalla, Evan Laforge, Jon Purdy, Purushotham Kamath, Dinesh Maheshwari, Michael Beidler, Geert Rosseel, Omar Ahmad, Gleb Gagarin, Richard Czekalski, Ashay Rane, Sahil Parmar, Jeff Werner, Jim Sproch, Adrian Macias, Brian Kurtz. 145-158 [doi]
- T4: Compiling Sequential Code for Effective Speculative Parallelization in HardwareVictor A. Ying, Mark C. Jeffrey, Daniel Sánchez 0003. 159-172 [doi]
- Efficiently Supporting Dynamic Task Parallelism on Heterogeneous Cache-Coherent SystemsMoyang Wang, Tuan Ta, Lin Cheng, Christopher Batten. 173-186 [doi]
- Flick: Fast and Lightweight ISA-Crossing Call for Heterogeneous-ISA EnvironmentsShenghsun Cho, Han Chen, Sergey Madaminov, Michael Ferdman, Peter A. Milder. 187-198 [doi]
- The NEBULA RPC-Optimized ArchitectureMark Sutherland, Siddharth Gupta, Babak Falsafi, Virendra Marathe, Dionisios N. Pnevmatikatos, Alexandros Daglis. 199-212 [doi]
- Printed MicroprocessorsNathaniel Bleier, Muhammad Husnain Mubarik, Farhan Rasheed, Jasmin Aghassi-Hagmann, Mehdi B. Tahoori, Rakesh Kumar. 213-226 [doi]
- SysScale: Exploiting Multi-domain Dynamic Voltage and Frequency Scaling for Energy Efficient Mobile ProcessorsJawad Haj-Yahya, Mohammed Alser, Jeremie S. Kim, Abdullah Giray Yaglikçi, Nandita Vijaykumar, Efraim Rotem, Onur Mutlu. 227-240 [doi]
- Déjà View: Spatio-Temporal Compute Reuse for' Energy-Efficient 360° VR Video StreamingShulin Zhao, Haibo Zhang, Sandeepa Bhuyan, Cyan Subhra Mishra, Ziyu Ying, Mahmut T. Kandemir, Anand Sivasubramaniam, Chita R. Das. 241-253 [doi]
- Genesis: A Hardware Acceleration Framework for Genomic Data AnalysisTae Jun Ham, David Bruns-Smith, Brendan Sweeney, Yejin Lee, Seong Hoon Seo, U. Gyeong Song, Young H. Oh, Krste Asanovic, Jae W. Lee, Lisa Wu Wills. 254-267 [doi]
- DSAGEN: Synthesizing Programmable Spatial AcceleratorsJian Weng 0002, Sihao Liu, Vidushi Dadu, Zhengrong Wang, Preyas Shah, Tony Nowatzki. 268-281 [doi]
- Bonsai: High-Performance Adaptive Merge Tree SortingNikola Samardzic, Weikang Qiao, Vaibhav Aggarwal, Mau-Chung Frank Chang, Jason Cong. 282-294 [doi]
- SOFF: An OpenCL High-Level Synthesis Framework for FPGAsGangwon Jo, Heehoon Kim, Jeesoo Lee, Jaejin Lee. 295-308 [doi]
- Gorgon: Accelerating Machine Learning from Relational DataMatthew Vilim, Alexander Rucker, Yaqi Zhang, Sophia Liu, Kunle Olukotun. 309-321 [doi]
- A Specialized Architecture for Object Serialization with Applications to Big Data AnalyticsJaeyoung Jang, Sungjun Jung, Sunmin Jeong, Jun Heo, Hoon Shin, Tae Jun Ham, Jae W. Lee. 322-334 [doi]
- CryoCore: A Fast and Dense Processor Architecture for Cryogenic ComputingIlkwon Byun, Dongmoon Min, Gyu-hyeon Lee, Seongmin Na, Jangwoo Kim. 335-348 [doi]
- SpinalFlow: An Architecture and Dataflow Tailored for Spiking Neural NetworksSurya Narayanan, Karl Taht, Rajeev Balasubramonian, Edouard Giacomin, Pierre-Emmanuel Gaillardon. 349-362 [doi]
- NEBULA: A Neuromorphic Spin-Based Ultra-Low Power Architecture for SNNs and ANNsSonali Singh, Anup Sarma, Nicholas Jao, Ashutosh Pattnaik, Sen Lu, Kezhou Yang, Abhronil Sengupta, Vijaykrishnan Narayanan, Chita R. Das. 363-376 [doi]
- UGEMM: Unary Computing Architecture for GEMM ApplicationsDi Wu, Jingjie Li, Ruokai Yin, Hsuan Hsiao, Younghyun Kim, Joshua San Miguel. 377-390 [doi]
- Hardware-Software Co-Design for Brain-Computer InterfacesIoannis Karageorgos, Karthik Sriram, Ján Veselý, Michael Wu, Marc Powell, David Borton, Rajit Manohar, Abhishek Bhattacharjee. 391-404 [doi]
- Heat to Power: Thermal Energy Harvesting and Recycling for Warm Water-Cooled DatacentersXinhui Zhu, Weixiang Jiang, Fangming Liu, Qixia Zhang, Li Pan, Qiong Chen, Ziyang Jia. 405-418 [doi]
- GraphABCD: Scaling Out Graph Analytics with Asynchronous Block Coordinate DescentYifan Yang, Zhaoshi Li, Yangdong Deng, Zhiwei Liu, Shouyi Yin, Shaojun Wei, Leibo Liu. 419-432 [doi]
- GaaS-X: Graph Analytics Accelerator Supporting Sparse Data Representation using Crossbar ArchitecturesNagadastagiri Challapalle, Sahithi Rampalli, Linghao Song, Nandhini Chandramoorthy, Karthik Swaminathan, John Sampson, Yiran Chen, Vijaykrishnan Narayanan. 433-445 [doi]
- MLPerf Inference BenchmarkVijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady Pekhimenko, Arun Tejusve Raghunath Rajan, Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, Yuchen Zhou. 446-459 [doi]
- Mocktails: Capturing the Memory Behaviour of Proprietary Mobile ArchitecturesMario Badr, Carlo Delconte, Isak Edo, Radhika Jagtap, Matteo Andreozzi, Natalie D. Enright Jerger. 460-472 [doi]
- Accel-Sim: An Extensible Simulation Framework for Validated GPU ModelingMahmoud Khairy, Zhesheng Shen, Tor M. Aamodt, Timothy G. Rogers. 473-486 [doi]
- HyperTRIO: Hyper-Tenant Translation of I/O AddressesAlexey Lavrov, David Wentzlaff. 487-500 [doi]
- BabelFish: Fusing Address Translations for ContainersDimitrios Skarlatos, Umur Darbaz, Bhargava Gopireddy, Nam Sung Kim, Josep Torrellas. 501-514 [doi]
- Enhancing and Exploiting Contiguity for Fast Memory VirtualizationChloe Alverti, Stratos Psomadakis, Vasileios Karakostas, Jayneel Gandhi, Konstantinos Nikas, Georgios I. Goumas, Nectarios Koziris. 515-528 [doi]
- Architecting Noisy Intermediate-Scale Trapped Ion Quantum ComputersPrakash Murali, Dripto M. Debroy, Kenneth R. Brown, Margaret Martonosi. 529-542 [doi]
- AccQOC: Accelerating Quantum Optimal Control Based Pulse GenerationJinglei Cheng, Haoqing Deng, Xuehai Qian. 543-555 [doi]
- NISQ+: Boosting quantum computing power by approximating quantum error correctionAdam Holmes, Mohammad Reza Jokar, Ghasem Pasandi, Yongshan Ding, Massoud Pedram, Frederic T. Chong. 556-569 [doi]
- SQUARE: Strategic Quantum Ancilla Reuse for Modular Quantum Programs via Cost-Effective UncomputationYongshan Ding, Xin-Chuan Wu, Adam Holmes, Ash Wiseth, Diana Franklin, Margaret Martonosi, Frederic T. Chong. 570-583 [doi]
- HOOP: Efficient Hardware-Assisted Out-of-Place Update for Non-Volatile MemoryMiao Cai, Chance C. Coats, Jian Huang. 584-596 [doi]
- Lelantus: Fine-Granularity Copy-On-Write Operations for Secure Non-Volatile MemoriesJian Zhou, Amro Awad, Jun Wang. 597-609 [doi]
- MorLog: Morphable Hardware Logging for Atomic Persistence in Non-Volatile Main MemoryXueliang Wei, Dan Feng 0001, Wei Tong, Jingning Liu, Liuqing Ye. 610-623 [doi]
- TVARAK: Software-Managed Hardware Offload for Redundancy in Direct-Access NVM StorageRajat Kateja, Nathan Beckmann, Gregory R. Ganger. 624-637 [doi]
- Revisiting RowHammer: An Experimental Analysis of Modern DRAM Devices and Mitigation TechniquesJeremie S. Kim, Minesh Patel, Abdullah Giray Yaglikçi, Hasan Hassan, Roknoddin Azizi, Lois Orosa, Onur Mutlu. 638-651 [doi]
- Relaxed Persist Ordering Using Strand PersistencyVaibhav Gogte, William Wang, Stephan Diestelhorst, Peter M. Chen, Satish Narayanasamy, Thomas F. Wenisch. 652-665 [doi]
- CLR-DRAM: A Low-Cost DRAM Architecture Enabling Dynamic Capacity-Latency Trade-OffHaocong Luo, Taha Shahroodi, Hasan Hassan, Minesh Patel, Abdullah Giray Yaglikçi, Lois Orosa, Jisung Park, Onur Mutlu. 666-679 [doi]
- Hardware-Based Domain Virtualization for Intra-Process Isolation of Persistent Memory ObjectsYuanchao Xu, Chencheng Ye, Yan Solihin, Xipeng Shen. 680-692 [doi]
- Check-In: In-Storage Checkpointing for Key-Value Store System Leveraging Flash-Based SSDsJoohyeong Yoon, Won Seob Jeong, Won Woo Ro. 693-706 [doi]
- Speculative Data-Oblivious Execution: Mobilizing Safe Prediction For Safe and Efficient Speculative ExecutionJiyong Yu, Namrata Mantri, Josep Torrellas, Adam Morrison 0001, Christopher W. Fletcher. 707-720 [doi]
- Packet Chasing: Spying on Network Packets over a Cache Side-ChannelMohammadkazem Taram, Ashish Venkat, Dean M. Tullsen. 721-734 [doi]
- Compact Leakage-Free Support for Integrity and ReliabilityMeysam Taassori, Rajeev Balasubramonian, Siddhartha Chhabra, Alaa R. Alameldeen, Manjula Peddireddy, Rajat Agarwal, Ryan Stutsman. 735-748 [doi]
- A Bus Authentication and Anti-Probing Architecture Extending Hardware Trusted Computing Base Off CPU Chips and BeyondZhenyu Xu, Thomas Mauldin, Zheyi Yao, Shuyi Pei, Tao Wei, Qing Yang. 749-761 [doi]
- CHEx86: Context-Sensitive Enforcement of Memory Safety via Microcode-Enabled CapabilitiesRasool Sharifi, Ashish Venkat. 762-775 [doi]
- Nested Enclave: Supporting Fine-grained Hierarchical Isolation with SGXJoongun Park, Naegyeong Kang, Taehoon Kim, Youngjin Kwon, Jaehyuk Huh. 776-789 [doi]
- RecNMP: Accelerating Personalized Recommendation with Near-Memory ProcessingLiu Ke, Udit Gupta, Benjamin Youngjae Cho, David Brooks 0001, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim M. Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Meng Li 0004, Bert Maher, Dheevatsa Mudigere, Maxim Naumov, Martin Schatz, Mikhail Smelyanskiy, Xiaodong Wang, Brandon Reagen, Carole-Jean Wu, Mark Hempstead, Xuan Zhang 0001. 790-803 [doi]
- iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank ArchitecturePeng Gu, Xinfeng Xie, Yufei Ding, Guoyang Chen, Weifeng Zhang, Dimin Niu, Yuan Xie. 804-817 [doi]
- Near Data Acceleration with Concurrent Host AccessBenjamin Y. Cho, Yongkee Kwon, Sangkug Lym, Mattan Erez. 818-831 [doi]
- Timely: Pushing Data Movements And Interfaces In Pim Accelerators Towards Local And In Time DomainWeitao Li, Pengfei Xu 0011, Yang Zhao, Haitong Li, Yuan Xie 0001, Yingyan Lin. 832-845 [doi]
- Hyper-Ap: Enhancing Associative Processing Through A Full-Stack OptimizationYue Zha, Jing Li. 846-859 [doi]
- JPEG-ACT: Accelerating Deep Learning via Transform-based Lossy CompressionR. David Evans, Lufei Liu, Tor M. Aamodt. 860-873 [doi]
- TransForm: Formally Specifying Transistency Models and Synthesizing Enhanced Litmus TestsNaorin Hossain, Caroline Trippel, Margaret Martonosi. 874-887 [doi]
- HieraGen: Automated Generation of Concurrent, Hierarchical Cache Coherence ProtocolsNicolai Oswald, Vijay Nagarajan, Daniel J. Sorin. 888-899 [doi]
- Tailored Page SizesFaruk Guvenilir, Yale N. Patt. 900-912 [doi]
- Perforated Page: Supporting Fragmented Memory Allocation for Large PagesChang Hyun Park 0001, Sanghoon Cha, Bokyeong Kim, Youngjin Kwon, David Black-Schaffer, Jaehyuk Huh. 913-925 [doi]
- Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUsEsha Choukse, Michael B. Sullivan, Mike O'Connor, Mattan Erez, Jeff Pool, David W. Nellans, Stephen W. Keckler. 926-939 [doi]
- A Multi-Neural Network Acceleration ArchitectureEunjin Baek, Dongup Kwon, Jangwoo Kim. 940-953 [doi]
- SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost ComputationYang Zhao, Xiaohan Chen, Yue Wang, Chaojian Li, Haoran You, Yonggan Fu, Yuan Xie 0001, Zhangyang Wang, Yingyan Lin. 954-967 [doi]
- Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized RecommendationsRanggi Hwang, Taehun Kim, Youngeun Kwon, Minsoo Rhu. 968-981 [doi]
- DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation InferenceUdit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandon Reagen, Gu-Yeon Wei, Hsien-Hsin S. Lee, David Brooks 0001, Carole-Jean Wu. 982-995 [doi]
- An In-Network Architecture for Accelerating Shared-Memory Multiprocessor CollectivesBenjamin Klenk, Nan Jiang, Greg Thorson, Larry Dennison. 996-1009 [doi]
- DRQ: Dynamic Region-based Quantization for Deep Neural Network AccelerationZhuoran Song, Bangqi Fu, Feiyang Wu, Zhaoming Jiang, Li Jiang 0002, Naifeng Jing, Xiaoyao Liang. 1010-1021 [doi]
- Independent Forward Progress of Work-groupsAlexandru Dutu, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood 0001, Marcus Chow. 1022-1035 [doi]
- ScoRD: A Scoped Race Detector for GPUsAditya K. Kamath, Alvin A. George, Arkaprava Basu. 1036-1049 [doi]
- The Virtual Block Interface: A Flexible Alternative to the Conventional Virtual Memory FrameworkNastaran Hajinazar, Pratyush Patel, Minesh Patel, Konstantinos Kanellopoulos, Saugata Ghose, Rachata Ausavarungnirun, Geraldo F. Oliveira, Jonathan Appavoo, Vivek Seshadri, Onur Mutlu. 1050-1063 [doi]
- ZnG: Architecting GPU Multi-Processors with New Flash for Scalable Data AnalysisJie Zhang 0004, Myoungsoo Jung. 1064-1075 [doi]
- Commutative Data Reordering: A New Technique to Reduce Data Movement Energy on Sparse Inference WorkloadsBen Feinberg, Benjamin C. Heyman, Darya Mikhailenko, Ryan Wong, An C. Ho, Engin Ipek. 1076-1088 [doi]
- Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN TrainingBojian Zheng, Nandita Vijaykumar, Gennady Pekhimenko. 1089-1102 [doi]
- A Case for Hardware-Based Demand PagingGyusun Lee, Wenjing Jin, Wonsuk Song, Jeonghun Gong, Jonghyun Bae, Tae Jun Ham, Jae W. Lee, Jinkyu Jeong. 1103-1116 [doi]