Abstract is missing.
- Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal SchedulingAnurag Mukkara, Nathan Beckmann, Maleen Abeydeera, Xiaosong Ma, Daniel Sánchez 0003. 1-14 [doi]
- Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware ApproachXuda Zhou, Zidong Du, Qi Guo, Shaoli Liu, Chengsi Liu, Chao Wang, Xuehai Zhou, Ling Li 0001, Tianshi Chen, Yunji Chen. 15-28 [doi]
- CSE: Parallel Finite State Machines with Convergence Set EnumerationYouwei Zhuo, Jinglei Cheng, Qinyi Luo, Jidong Zhai, Yanzhi Wang, Zhongzhi Luan, Xuehai Qian. 29-41 [doi]
- Inter-Thread Communication in Multithreaded, Reconfigurable Coarse-Grain ArraysDani Voitsechov, Oron Port, Yoav Etsion. 42-54 [doi]
- An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable HardwareTao Chen, Shreesha Srinath, Christopher Batten, G. Edward Suh. 55-67 [doi]
- Composable Building Blocks to Open up Processor DesignSizhuo Zhang, Andrew Wright, Thomas Bourgeat, Arvind. 68-81 [doi]
- Performance Improvement by Prioritizing the Issue of the Instructions in Unconfident Branch SlicesHideki Ando. 82-94 [doi]
- The Superfluous Load QueueAlberto Ros, Stefanos Kaxiras. 95-107 [doi]
- Architectural Support for Probabilistic BranchesAlmutaz Adileh, David J. Lilja, Lieven Eeckhout. 108-120 [doi]
- STRAIGHT: Hazardless Processor Architecture Without Register RenamingHidetsugu Irie, Toru Koizumi, Akifumi Fukuda, Seiya Akaki, Satoshi Nakae, Yutaro Bessho, Ryota Shioya, Takahiro Notsu, Katsuhiro Yoda, Teruo Ishihara, Shuichi Sakai. 121-133 [doi]
- Diffy: a Déjà vu-Free Differential Deep Neural Network AcceleratorMostafa Mahmoud, Kevin Siu, Andreas Moshovos. 134-147 [doi]
- Beyond the Memory Wall: A Case for Memory-Centric HPC System for Deep LearningYoungeun Kwon, Minsoo Rhu. 148-161 [doi]
- Towards Memory Friendly Long-Short Term Memory Networks (LSTMs) on Mobile GPUsXingyao Zhang, Chenhao Xie, Jing Wang, Weidong Zhang, Xin Fu. 162-174 [doi]
- A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural NetworksYoujie Li, Jongse Park, Mohammad Alian, Yifan Yuan, Zheng Qu, Peitian Pan, Ren Wang, Alexander G. Schwing, Hadi Esmaeilzadeh, Nam Sung Kim. 175-188 [doi]
- PermDNN: Efficient Compressed DNN Architecture with Permuted Diagonal MatricesChunhua Deng, Siyu Liao, Yi Xie, Keshab K. Parhi, Xuehai Qian, Bo Yuan 0001. 189-202 [doi]
- Rethinking the Memory Hierarchy for Modern LanguagesPo-An Tsai, Yee Ling Gan, Daniel Sánchez 0003. 203-216 [doi]
- Harmonizing Speculative and Non-Speculative Execution in Architectures for Ordered ParallelismMark C. Jeffrey, Victor A. Ying, Suvinay Subramanian, Hyun Ryong Lee, Joel S. Emer, Daniel Sánchez 0003. 217-230 [doi]
- Sampler: PMU-Based Sampling to Detect Memory Errors Latent in Production SoftwareSam Silvestro, Hongyu Liu, Tong Zhang, Changhee Jung, Dongyoon Lee, Tongping Liu. 231-244 [doi]
- TAPAS: Generating Parallel Accelerators from Parallel ProgramsSteve Margerm, Amirali Sharifian, Apala Guha, Arrvindh Shriraman, Gilles Pokam. 245-257 [doi]
- iDO: Compiler-Directed Failure Atomicity for Nonvolatile MemoryQingrui Liu, Joseph Izraelevitz, Se Kwon Lee, Michael L. Scott, Sam H. Noh, Changhee Jung. 258-270 [doi]
- Scalable Distributed Last-Level TLBs Using Low-Latency InterconnectsSrikant Bharadwaj, Guilherme Cox, Tushar Krishna, Abhishek Bhattacharjee. 271-284 [doi]
- Duplicon Cache: Mitigating Off-Chip Memory Bank and Bank Group Conflicts Via Data DuplicationBen Lin, Michael B. Healy, Rustam Miftakhutdinov, Philip G. Emma, Yale N. Patt. 285-297 [doi]
- Reducing DRAM Latency via Charge-Level-Aware Look-Ahead Partial RestorationYaohua Wang, Arash Tavakkol, Lois Orosa, Saugata Ghose, Nika Mansouri-Ghiasi, Minesh Patel, Jeremie S. Kim, Hasan Hassan, Mohammad Sadrosadati, Onur Mutlu. 298-311 [doi]
- CABLE: A CAche-Based Link Encoder for Bandwidth-Starved ManycoresTri M. Nguyen, Adi Fuchs, David Wentzlaff. 312-325 [doi]
- Attaché: Towards Ideal Memory Compression by Mitigating Metadata Bandwidth OverheadsSeokin Hong, Prashant Jayaprakash Nair, Bülent Abali, Alper Buyuktosunoglu, Kyu-hyoun Kim, Michael B. Healy. 326-338 [doi]
- Combining HW/SW Mechanisms to Improve NUMA Performance of Multi-GPU SystemsVinson Young, Aamer Jaleel, Evgeny Bolotin, Eiman Ebrahimi, David W. Nellans, Oreste Villa. 339-351 [doi]
- Neighborhood-Aware Address Translation for Irregular GPU ApplicationsSeunghee Shin, Michael LeBeane, Yan Solihin, Arkaprava Basu. 352-363 [doi]
- FineReg: Fine-Grained Register File Management for Augmenting GPU ThroughputYunho Oh, Myung Kuk Yoon, William J. Song, Won Woo Ro. 364-376 [doi]
- In-Register Parameter Caching for Dynamic Neural Nets with Virtual Persistent Processor SpecializationFarzad Khorasani, Hodjat Asghari Esfeden, Nael B. Abu-Ghazaleh, Vivek Sarkar. 377-389 [doi]
- Voltage-Stacked GPUs: A Control Theory Driven Cross-Layer Solution for Practical Voltage Stacking in GPUsAn Zou, Jingwen Leng, Xin He, Yazhou Zu, Christopher D. Gill, Vijay Janapa Reddi, Xuan Zhang. 390-402 [doi]
- Osiris: A Low-Cost Mechanism to Enable Restoration of Secure Non-Volatile MemoriesMao Ye, Clayton Hughes, Amro Awad. 403-415 [doi]
- Morphable Counters: Enabling Compact Integrity Trees For Low-Overhead Secure MemoriesGururaj Saileshwar, Prashant J. Nair, Prakash Ramrakhyani, Wendy Elsasser, Jose Joao, Moinuddin K. Qureshi. 416-427 [doi]
- InvisiSpec: Making Speculative Execution Invisible in the Cache HierarchyMengjia Yan, Jiho Choi, Dimitrios Skarlatos, Adam Morrison 0001, Christopher W. Fletcher, Josep Torrellas. 428-441 [doi]
- Improving the Performance and Endurance of Encrypted Non-Volatile Main Memory through Deduplicating WritesPengfei Zuo, Yu Hua 0001, Ming Zhao, Wen Zhou, Yuncheng Guo. 442-454 [doi]
- SSDcheck: Timely and Accurate Prediction of Irregular Behaviors in Black-Box SSDsJoonsung Kim, Pyeongsu Park, Jaehyung Ahn, Jihun Kim, Jong Kim 0001, Jangwoo Kim. 455-468 [doi]
- Amber*: Enabling Precise Full-System Simulation with Detailed Modeling of All SSD ResourcesDonghyun Gouk, Miryeong Kwon, Jie Zhang 0048, Sungjoon Koh, Wonil Choi, Nam Sung Kim, Mahmut T. Kandemir, Myoungsoo Jung. 469-481 [doi]
- Invalid Data-Aware Coding to Enhance the Read Performance of High-Density Flash MemoriesWonil Choi, Myoungsoo Jung, Mahmut T. Kandemir. 482-493 [doi]
- Persistence Parallelism Optimization: A Holistic Approach from Memory Bus to RDMA NetworkXing Hu, Matheus Ogleari, Jishen Zhao, Shuangchen Li, Abanti Basak, Yuan Xie 0001. 494-506 [doi]
- PiCL: A Software-Transparent, Persistent Cache Log for Nonvolatile Main MemoryTri Nguyen, David Wentzlaff. 507-519 [doi]
- Efficient Hardware-Assisted Logging with Asynchronous and Direct-Update for Persistent MemoryJungi Jeong, Chang-Hyun Park, Jaehyuk Huh, Seungryoul Maeng. 520-532 [doi]
- CHAMELEON: A Dynamically Reconfigurable Heterogeneous Memory SystemJagadish B. Kotra, Haibo Zhang, Alaa R. Alameldeen, Chris Wilkerson, Mahmut T. Kandemir. 533-545 [doi]
- Compresso: Pragmatic Main Memory CompressionEsha Choukse, Mattan Erez, Alaa R. Alameldeen. 546-558 [doi]
- Farewell My Shared LLC! A Case for Private Die-Stacked DRAM Caches for ServersAmna Shahab, Mingcan Zhu, Artemiy Margaritov, Boris Grot. 559-572 [doi]
- Leveraging CPU Electromagnetic Emanations for Voltage Noise CharacterizationZacharias Hadjilambrou, Shidhartha Das, Marco A. Antoniades, Yiannakis Sazeides. 573-585 [doi]
- RpStacks-MT: A High-Throughput Design Evaluation Methodology for Multi-Core ProcessorsHanhwi Jang, Jae-Eon Jo, Jaewon Lee, Jangwoo Kim. 586-599 [doi]
- The EH Model: Early Design Space Exploration of Intermittent Processor ArchitecturesJoshua San Miguel, Karthik Ganesan, Mario Badr, Chunqiu Xia, Rose Li, Hsuan Hsiao, Natalie D. Enright Jerger. 600-612 [doi]
- CounterMiner: Mining Big Performance Data from Hardware CountersYirong Lv, Bin Sun, Qingyi Luo, Jing Wang, Zhibin Yu 0001, Xuehai Qian. 613-626 [doi]
- Taming the Killer MicrosecondShenghsun Cho, Amoghavarsha Suresh, Tapti Palit, Michael Ferdman, Nima Honarmand. 627-640 [doi]
- Adaptive Scheduling for Systems with Asymmetric Memory HierarchiesPo-An Tsai, Changping Chen, Daniel Sánchez 0003. 641-654 [doi]
- Processing-in-Memory for Energy-Efficient Neural Network Training: A Heterogeneous ApproachJiawen Liu, Hengyu Zhao, Matheus A. Ogleari, Dong Li, Jishen Zhao. 655-668 [doi]
- LerGAN: A Zero-Free, Low Data Movement and PIM-Based GAN ArchitectureHaiyu Mao, Mingcong Song, Tao Li, Yuting Dai, Jiwu Shu. 669-681 [doi]
- Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric ArchitectureByungchul Hong, Yeonju Ro, John Kim. 682-695 [doi]
- SCOPE: A Stochastic Computing Engine for DRAM-Based In-Situ AcceleratorShuangchen Li, Alvin Oliver Glova, Xing Hu, Peng Gu, Dimin Niu, Krishna T. Malladi, Hongzhong Zheng, Bob Brennan, Yuan Xie. 696-709 [doi]
- Exploring and Optimizing Chipkill-Correct for Persistent Memory Based on High-Density NVRAMsDa Zhang, Vilas Sridharan, Xun Jian. 710-723 [doi]
- Comprehensive Evaluation of Supply Voltage Underscaling in FPGA on-Chip MemoriesBehzad Salami 0001, Osman S. Unsal, Adrián Cristal Kestelman. 724-736 [doi]
- Error Correlation Prediction in Lockstep Processors for Safety-Critical SystemsEmre Ozer, Balaji Venu, Xabier Iturbe, Shidhartha Das, Spyros Lyberis, John Biggs, Peter Harrod, John Penton. 737-748 [doi]
- Fault Site Pruning for Practical Reliability Analysis of GPGPU ApplicationsBin Nie, Lishan Yang, Adwait Jog, Evgenia Smirni. 749-761 [doi]
- SwapCodes: Error Codes for Hardware-Software Cooperative GPU Pipeline Error DetectionMichael B. Sullivan, Siva Kumar Sastry Hari, Brian Zimmer, Timothy Tsai, Stephen W. Keckler. 762-774 [doi]
- CEASER: Mitigating Conflict-Based Cache Attacks via Encrypted-Address and RemappingMoinuddin K. Qureshi. 775-787 [doi]
- PipeProof: Automated Memory Consistency Proofs for Microarchitectural SpecificationsYatin A. Manerkar, Daniel Lustig, Margaret Martonosi, Aarti Gupta. 788-801 [doi]
- Application-Transparent Near-Memory Processing Architecture with Memory Channel NetworkMohammad Alian, Seungwon Min, Hadi Asgharimoghaddam, Ashutosh Dhar, Dong Kai Wang, Thomas Roewer, Adam J. McPadden, Oliver O'Halloran, Deming Chen, Jinjun Xiong, Daehoon Kim, Wen-mei W. Hwu, Nam Sung Kim. 802-814 [doi]
- End-to-End Automated Exploit Generation for Validating the Security of Processor DesignsRui Zhang, Calvin Deutschbein, Peng Huang, Cynthia Sturton. 815-827 [doi]
- Magic-State Functional Units: Mapping and Scheduling Multi-Level Distillation Circuits for Fault-Tolerant Quantum ArchitecturesYongshan Ding, Adam Holmes, Ali Javadi-Abhari, Diana Franklin, Margaret Martonosi, Frederic T. Chong. 828-840 [doi]
- MDACache: Caching for Multi-Dimensional-Access MemoriesSumitha George, Minli Julie Liao, Huaipan Jiang, Jagadish B. Kotra, Mahmut T. Kandemir, Jack Sampson, Vijaykrishnan Narayanan. 841-854 [doi]
- GeneSys: Enabling Continuous Learning through Neural Network Evolution in HardwareAnanda Samajdar, Parth Mannan, Kartikay Garg, Tushar Krishna. 855-866 [doi]
- CritICs Critiquing Criticality in Mobile AppsPrasanna Venkatesh Rengasamy, Haibo Zhang, Shulin Zhao, Nachiappan Chidambaram Nachiappan, Anand Sivasubramaniam, Mahmut T. Kandemir, Chita R. Das. 867-880 [doi]
- EMPROF: Memory Profiling Via EM-Emanation in IoT and Hand-Held DevicesMoumita Dey, Alireza Nazari, Alenka G. Zajic, Milos Prvulovic. 881-893 [doi]
- MAVBench: Micro Aerial Vehicle BenchmarkingBehzad Boroujerdian, Hasan Genc, Srivatsan Krishnan, Wenzhi Cui, Aleksandra Faust, Vijay Janapa Reddi. 894-907 [doi]
- Architectural Support for Efficient Large-Scale Automata ProcessingHongyuan Liu, Mohamed Ibrahim, Onur Kayiran, Sreepathi Pai, Adwait Jog. 908-920 [doi]
- ASPEN: A Scalable In-SRAM Architecture for Pushdown AutomataKevin Angstadt, Arun Subramaniyan 0001, Elaheh Sadredini, Reza Rahimi, Kevin Skadron, Westley Weimer, Reetuparna Das. 921-932 [doi]
- Morph: Flexible Acceleration for 3D CNN-Based Video UnderstandingKartik Hegde, Rohit Agrawal, Yulun Yao, Christopher W. Fletcher. 933-946 [doi]
- CheckMate: Automated Synthesis of Hardware Exploits and Security Litmus TestsCaroline Trippel, Daniel Lustig, Margaret Martonosi. 947-960 [doi]
- Shadow Block: Accelerating ORAM Accesses with Data DuplicationXian Zhang, Guangyu Sun, Peichen Xie, Chao Zhang, Yannan Liu, Lingxiao Wei, Qiang Xu 0001, Chun Jason Xue. 961-973 [doi]
- DAWG: A Defense Against Cache Timing Attacks in Speculative Execution ProcessorsVladimir Kiriansky, Ilia A. Lebedev, Saman P. Amarasinghe, Srinivas Devadas, Joel S. Emer. 974-987 [doi]