Abstract is missing.
- In-Datacenter Performance Analysis of a Tensor Processing UnitNorman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, Doe Hyun Yoon. 1-12 [doi]
- ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep NetworksSwagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das 0002, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, Anand Raghunathan. 13-26 [doi]
- SCNN: An Accelerator for Compressed-sparse Convolutional Neural NetworksAngshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel S. Emer, Stephen W. Keckler, William J. Dally. 27-40 [doi]
- Bespoke Processors for Applications with Ultra-low Area and Power ConstraintsHari Cherupalli, Henry Duwe, Weidong Ye, Rakesh Kumar 0002, John Sartori. 41-54 [doi]
- A Programmable Galois Field Processor for the Internet of ThingsYajing Chen, Shengshuo Lu, Cheng Fu, David Blaauw, Ronald Dreslinski Jr., Trevor N. Mudge, Hun-Seok Kim. 55-68 [doi]
- XPro: A Cross-End Processing Architecture for Data Analytics in WearablesAosen Wang, Lizhong Chen, Wenyao Xu. 69-80 [doi]
- Regaining Lost Cycles with HotCalls: A Fast Interface for SGX Secure EnclavesOfir Weisse, Valeria Bertacco, Todd M. Austin. 81-93 [doi]
- InvisiMem: Smart Memory Defenses for Memory Bus Side ChannelShaizeen Aga, Satish Narayanasamy. 94-106 [doi]
- ObfusMem: A Low-Overhead Access Obfuscation for Trusted MemoriesAmro Awad, Yipeng Wang, Deborah Shands, Yan Solihin. 107-119 [doi]
- ThermoGater: Thermally-Aware On-Chip Voltage RegulationS. Karen Khatamifard, Longfei Wang, Weize Yu, Selçuk Köse, Ulya R. Karpuzcu. 120-132 [doi]
- PowerChief: Intelligent Power Allocation for Multi-Stage Applications to Improve Responsiveness on Power Constrained CMPHailong Yang, Quan Chen, Moeiz Riaz, Zhongzhi Luan, Lingjia Tang, Jason Mars. 133-146 [doi]
- CHARSTAR: Clock Hierarchy Aware Resource Scaling in Tiled ARchitecturesGokul Subramanian Ravi, Mikko H. Lipasti. 147-160 [doi]
- Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics on Heterogeneous SystemsMatthew D. Sinclair, Johnathan Alsop, Sarita V. Adve. 161-174 [doi]
- Hiding the Long Latency of Persist Barriers Using Speculative ExecutionSeunghee Shin, James Tuck, Yan Solihin. 175-186 [doi]
- Non-Speculative Load-Load Reordering in TSOAlberto Ros, Trevor E. Carlson, Mehdi Alipour, Stefanos Kaxiras. 187-200 [doi]
- MTraceCheck: Validating Non-Deterministic Behavior of Memory Consistency Models in Post-Silicon ValidationDoowon Lee, Valeria Bertacco. 201-213 [doi]
- Redundant Memory Array Architecture for Efficient Selective ProtectionRuohuang Zheng, Michael C. Huang. 214-227 [doi]
- Clank: Architectural Support for Intermittent ComputationMatthew Hicks. 228-240 [doi]
- MeRLiN: Exploiting Dynamic Instruction Behavior for Fast and Accurate Microarchitecture Level Reliability AssessmentManolis Kaliorakis, Dimitris Gizopoulos, Ramon Canal, Antonio González 0001. 241-254 [doi]
- The Reach Profiler (REAPER): Enabling the Mitigation of DRAM Retention Failures via Profiling at Aggressive ConditionsMinesh Patel, Jeremie S. Kim, Onur Mutlu. 255-268 [doi]
- Quality of Service Support for Fine-Grained Sharing on GPUsZhenning Wang, Jun Yang, Rami G. Melhem, Bruce R. Childers, Youtao Zhang, Minyi Guo. 269-281 [doi]
- Accelerating GPU Hardware Transactional Memory with Snapshot IsolationSui Chen, Lu Peng, Samuel Irving. 282-294 [doi]
- Decoupled Affine Computation for SIMT GPUsKai Wang, Calvin Lin. 295-306 [doi]
- Access Pattern-Aware Cache Management for Improving Data Utilization in GPUGunjae Koo, Yunho Oh, Won Woo Ro, Murali Annavaram. 307-319 [doi]
- MCM-GPU: Multi-Chip-Module GPUs for Continued Performance ScalabilityAkhil Arunkumar, Evgeny Bolotin, Benjamin Cho, Ugljesa Milic, Eiman Ebrahimi, Oreste Villa, Aamer Jaleel, Carole-Jean Wu, David W. Nellans. 320-332 [doi]
- EDDIE: EM-Based Detection of Deviations in Program ExecutionAlireza Nazari, Nader Sehatbakhsh, Monjur Alam, Alenka G. Zajic, Milos Prvulovic. 333-346 [doi]
- Secure Hierarchy-Aware Cache Replacement Policy (SHARP): Defending Against Cache-Based Side Channel AtacksMengjia Yan, Bhargava Gopireddy, Thomas Shull, Josep Torrellas. 347-360 [doi]
- Lemonade from Lemons: Harnessing Device Wearout to Create Limited-Use Security ArchitecturesZhaoxia Deng, Ariel Feldman, Stuart A. Kurtz, Frederic T. Chong. 361-374 [doi]
- LogCA: A High-Level Performance Model for Hardware AcceleratorsMuhammad Shoaib Bin Altaf, David A. Wood. 375-388 [doi]
- Plasticine: A Reconfigurable Architecture For Parallel PaternsRaghu Prabhakar, Yaqi Zhang, David Koeplinger, Matthew Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, Kunle Olukotun. 389-402 [doi]
- A Programmable Hardware Accelerator for Simulating Dynamical SystemsJaeha Kung, Yun Long, Duckhwan Kim 0001, Saibal Mukhopadhyay. 403-415 [doi]
- Stream-Dataflow AccelerationTony Nowatzki, Vinay Gangadhar, Newsha Ardalani, Karthikeyan Sankaralingam. 416-429 [doi]
- Hardware Translation Coherence for Virtualized SystemsZi Yan, Ján Veselý, Guilherme Cox, Abhishek Bhattacharjee. 430-443 [doi]
- Hybrid TLB Coalescing: Improving TLB Translation Coverage under Diverse Fragmented Memory AllocationsChang-Hyun Park, Taekyung Heo, Jungi Jeong, Jaehyuk Huh. 444-456 [doi]
- Do-It-Yourself Virtual Memory TranslationHanna Alam, Tianhao Zhang, Mattan Erez, Yoav Etsion. 457-468 [doi]
- Rethinking TLB Designs in Virtualized Environments: A Very Large Part-of-Memory TLBJee Ho Ryoo, Nagendra Gulur, Shuang Song, Lizy K. John. 469-480 [doi]
- Language-level persistencyAasheesh Kolli, Vaibhav Gogte, Ali G. Saidi, Stephan Diestelhorst, Peter M. Chen, Satish Narayanasamy, Thomas F. Wenisch. 481-493 [doi]
- ShortCut: Architectural Support for Fast Object Access in Scripting LanguagesJiho Choi, Thomas Shull, María Jesús Garzarán, Josep Torrellas. 494-506 [doi]
- Architectural Support for Server-Side PHP ProcessingDibakar Gope, David J. Schlais, Mikko H. Lipasti. 507-520 [doi]
- HeteroOS: OS Design for Heterogeneous Memory Management in DatacenterSudarsun Kannan, Ada Gavrilovska, Vishal Gupta, Karsten Schwan. 521-534 [doi]
- Maximizing CNN Accelerator Efficiency Through Resource PartitioningYongming Shen, Michael Ferdman, Peter Milder. 535-547 [doi]
- Scalpel: Customizing DNN Pruning to the Underlying Hardware ParallelismJiecao Yu, Andrew Lukefahr, David J. Palframan, Ganesh S. Dasika, Reetuparna Das, Scott A. Mahlke. 548-560 [doi]
- Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient DescentChristopher De Sa, Matthew Feldman, Christopher Ré, Kunle Olukotun. 561-574 [doi]
- Aggressive Pipelining of Irregular Applications on Reconfigurable HardwareZhaoshi Li, Leibo Liu, Yangdong Deng, Shouyi Yin, Yao Wang, Shaojun Wei. 575-586 [doi]
- Fractal: An Execution Model for Fine-Grain Nested Speculative ParallelismSuvinay Subramanian, Mark C. Jeffrey, Maleen Abeydeera, Hyun Ryong Lee, Victor A. Ying, Joel S. Emer, Daniel Sanchez. 587-599 [doi]
- Parallel Automata ProcessorArun Subramaniyan 0001, Reetuparna Das. 600-612 [doi]
- Viyojit: Decoupling Battery and DRAM Capacities for Battery-Backed DRAMRajat Kateja, Anirudh Badam, Sriram Govindan, Bikash Sharma, Greg Ganger. 613-626 [doi]
- DICE: Compressing DRAM Caches for Bandwidth and CapacityVinson Young, Prashant J. Nair, Moinuddin K. Qureshi. 627-638 [doi]
- The Mondrian Data EngineMario Drumond, Alexandros Daglis, Nooshin Mirzadeh, Dmitrii Ustiugov, Javier Picorel, Babak Falsafi, Boris Grot, Dionisios N. Pnevmatikatos. 639-651 [doi]
- Jenga: Software-Defined Cache HierarchiesPo-An Tsai, Nathan Beckmann, Daniel Sanchez. 652-665 [doi]
- APPROX-NoC: A Data Approximation Framework for Network-On-Chip ArchitecturesRahul Boyapati, Jiayi Huang, Pritam Majumder, Ki Hwan Yum, Eun Jung Kim 0001. 666-677 [doi]
- There and Back Again: Optimizing the Interconnect in Networks of Memory CubesMatthew Poremba, Itir Akgun, Jieming Yin, Onur Kayiran, Yuan Xie, Gabriel H. Loh. 678-690 [doi]
- Footprint: Regulating Routing Adaptiveness in Networks-on-ChipBinzhang Fu, John Kim. 691-702 [doi]
- EbDa: A New Theory on Design and Verification of Deadlock-free Interconnection NetworksMasoumeh Ebrahimi, Masoud Daneshtalab. 703-715 [doi]