Abstract is missing.
- BlueDBM: an appliance for big data analyticsSang-Woo Jun, Ming Liu, Sungjin Lee, Jamey Hicks, John Ankcorn, Myron King, Shuotao Xu, Arvind. 1-13 [doi]
- Towards sustainable in-situ server systems in the big data eraChao Li, Yang Hu, Longjun Liu, Juncheng Gu, Mingcong Song, Xiaoyao Liang, Jingling Yuan, Tao Li. 14-26 [doi]
- DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computersJohann Hauswald, Yiping Kang, Michael A. Laurenzano, Quan Chen, Cheng Li, Trevor N. Mudge, Ronald G. Dreslinski, Jason Mars, Lingjia Tang. 27-40 [doi]
- A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warpsNandita Vijaykumar, Gennady Pekhimenko, Adwait Jog, Abhishek Bhowmick 0002, Rachata Ausavarungnirun, Chita R. Das, Mahmut T. Kandemir, Todd C. Mowry, Onur Mutlu. 41-53 [doi]
- Harmonia: balancing compute and memory power in high-performance GPUsIndrani Paul, Wei Huang, Manish Arora, Sudhakar Yalamanchili. 54-65 [doi]
- Redundant memory mappings for fast access to large memoriesVasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S. McKinley, Mario Nemirovsky, Michael M. Swift, Osman S. Unsal. 66-78 [doi]
- Page overlays: an enhanced virtual memory framework to enable fine-grained memory managementVivek Seshadri, Gennady Pekhimenko, Olatunji Ruwase, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry, Trishul M. Chilimbi. 79-91 [doi]
- ShiDianNao: shifting vision processing closer to the sensorZidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, Olivier Temam. 92-104 [doi]
- A scalable processing-in-memory accelerator for parallel graph processingJunwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, Kiyoung Choi. 105-117 [doi]
- Efficient execution of memory access phases using dataflow specializationChen-Han Ho, Sung Jin Kim, Karthikeyan Sankaralingam. 118-130 [doi]
- Data reorganization in memory using 3D-stacked DRAMBerkin Akin, Franz Franchetti, James C. Hoe. 131-143 [doi]
- Quantitative comparison of hardware transactional memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8Takuya Nakaike, Rei Odaira, Matthew Gaudet, Maged M. Michael, Hisanobu Tomari. 144-157 [doi]
- Profiling a warehouse-scale computerSvilen Kanev, Juan Pablo Darago, Kim M. Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, David M. Brooks. 158-169 [doi]
- Computer performance microscopy with ShimXi Yang, Stephen M. Blackburn, Kathryn S. McKinley. 170-184 [doi]
- Flexible software profiling of GPU architecturesMark Stephenson, Siva Kumar Sastry Hari, Yunsup Lee, Eiman Ebrahimi, Daniel R. Johnson, David W. Nellans, Mike O'Connor, Stephen W. Keckler. 185-197 [doi]
- BEAR: techniques for mitigating bandwidth bloat in gigascale DRAM cachesChia-Chen Chou, Aamer Jaleel, Moinuddin K. Qureshi. 198-210 [doi]
- A fully associative, tagless DRAM cacheYongjun Lee, JongWon Kim, Hakbeom Jang, Hyunggyun Yang, Jangwoo Kim, Jinkyu Jeong, Jae W. Lee. 211-222 [doi]
- Multiple clone row DRAM: a low latency and area optimized DRAMJungwhan Choi, Wongyu Shin, Jaemin Jang, Jinwoong Suh, Yongkee Kwon, Youngsuk Moon, Lee-Sup Kim. 223-234 [doi]
- Flexible auto-refresh: enabling scalable and energy-efficient DRAM refresh reductionsIshwar Bhati, Zeshan Chishti, Shih-Lien Lu, Bruce Jacob. 235-246 [doi]
- Cost-effective speculative scheduling in high performance processorsArthur Perais, André Seznec, Pierre Michaud, Andreas Sembrant, Erik Hagersten. 247-259 [doi]
- LaZy superscalarGörkem Asilioglu, Zhaoxiang Jin, Murat Köksal, Omkar Javeri, Soner Önder. 260-271 [doi]
- The load slice core microarchitectureTrevor E. Carlson, Wim Heirman, Osman Allam, Stefanos Kaxiras, Lieven Eeckhout. 272-284 [doi]
- Semantic locality and context-based prefetching using reinforcement learningLeeor Peled, Shie Mannor, Uri C. Weiser, Yoav Etsion. 285-297 [doi]
- Exploring the potential of heterogeneous von neumann/dataflow execution modelsTony Nowatzki, Vinay Gangadhar, Karthikeyan Sankaralingam. 298-310 [doi]
- SHRINK: reducing the ISA complexity via instruction recyclingBruno Cardoso Lopes, Rafael Auler, Luiz Ramos, Edson Borin, Rodolfo Azevedo. 311-322 [doi]
- Branch vanguard: decomposing branch functionality into prediction and resolution instructionsDaniel S. McFarlin, Craig B. Zilles. 323-335 [doi]
- PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architectureJunwhan Ahn, Sungjoo Yoo, Onur Mutlu, Kiyoung Choi. 336-348 [doi]
- SLIP: reducing wire energy in the memory hierarchySubhasis Das, Tor M. Aamodt, William J. Dally. 349-361 [doi]
- CloudMonatt: an architecture for security health monitoring and attestation of virtual machines in cloud computingTianwei Zhang, Ruby B. Lee. 362-374 [doi]
- Reducing world switches in virtualized environment with flexible cross-world callsWenhao Li, Yubin Xia, Haibo Chen, Binyu Zang, Haibing Guan. 375-387 [doi]
- ArMOR: defending against memory consistency model mismatches in heterogeneous architecturesDaniel Lustig, Caroline Trippel, Michael Pellauer, Margaret Martonosi. 388-400 [doi]
- Clean: a race detector with cleaner semanticsCedomir Segulja, Tarek S. Abdelrahman. 401-413 [doi]
- MiSAR: minimalistic synchronization accelerator with resource overflow managementChing-Kai Liang, Milos Prvulovic. 414-426 [doi]
- Callback: efficient synchronization without invalidation with a directory just for spin-waitingAlberto Ros, Stefanos Kaxiras. 427-438 [doi]
- Thermal time shifting: leveraging phase change materials to reduce cooling costs in warehouse-scale computersMatt Skach, Manish Arora, Chang-Hong Hsu, Qi Li, Dean M. Tullsen, Lingjia Tang, Jason Mars. 439-449 [doi]
- Heracles: improving resource efficiency at scaleDavid Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, Christos Kozyrakis. 450-462 [doi]
- HEB: deploying and managing hybrid energy buffers for improving datacenter efficiency and economyLongjun Liu, Chao Li, Hongbin Sun, Yang Hu, Juncheng Gu, Tao Li, Jingmin Xin, Nanning Zheng. 463-475 [doi]
- Architecting to achieve a billion requests per second throughput on a single key-value store server platformSheng Li, Hyeontaek Lim, Victor W. Lee, Jung Ho Ahn, Anuj Kalia, Michael Kaminsky, David G. Andersen, O. Seongil, Sukhan Lee, Pradeep Dubey. 476-488 [doi]
- A variable warp size architectureTimothy G. Rogers, Daniel R. Johnson, Mike O'Connor, Stephen W. Keckler. 489-501 [doi]
- Warped-compression: enabling power efficient GPUs through register compressionSangpil Lee, Keunsoo Kim, Gunjae Koo, Hyeran Jeon, Won Woo Ro, Murali Annavaram. 502-514 [doi]
- CAWA: coordinated warp scheduling and cache prioritization for critical warp acceleration of GPGPU workloadsShin-Ying Lee, Akhil Arunkumar, Carole-Jean Wu. 515-527 [doi]
- Dynamic thread block launch: a lightweight execution mechanism to support irregular applications on GPUsJin Wang, Norm Rubin, Albert Sidelnik, Sudhakar Yalamanchili. 528-540 [doi]
- DynaSpAM: dynamic spatial architecture mapping using out of order instruction schedulesFeng Liu, Heejin Ahn, Stephen R. Beard, Taewook Oh, David I. August. 541-553 [doi]
- Rumba: an online quality management system for approximate computingDaya Shanker Khudia, Babak Zamirai, Mehrzad Samadi, Scott A. Mahlke. 554-566 [doi]
- Manycore network interfaces for in-memory rack-scale computingAlexandros Daglis, Stanko Novakovic, Edouard Bugnion, Babak Falsafi, Boris Grot. 567-579 [doi]
- Unified address translation for memory-mapped SSDs with FlashMapJian Huang, Anirudh Badam, Moinuddin K. Qureshi, Karsten Schwan. 580-591 [doi]
- FASE: finding amplitude-modulated side-channel emanationsRobert Callan, Alenka G. Zajic, Milos Prvulovic. 592-603 [doi]
- Probable cause: the deanonymizing effects of approximate DRAMAmir Rahmati, Matthew Hicks, Daniel E. Holcomb, Kevin Fu. 604-615 [doi]
- PrORAM: dynamic prefetcher for oblivious RAMXiangyao Yu, Syed Kamran Haider, Ling Ren, Christopher W. Fletcher, Albert Kwon, Marten van Dijk, Srinivas Devadas. 616-628 [doi]
- MBus: an ultra-low power interconnect bus for next generation nanopower systemsPat Pannuto, Yoonmyung Lee, Ye-Sheng Kuo, Zhiyoong Foo, Benjamin P. Kempke, Gyouho Kim, Ronald G. Dreslinski, David Blaauw, Prabal Dutta. 629-641 [doi]
- Accelerating asynchronous programs through event sneak peekGaurav Chadha, Scott A. Mahlke, Satish Narayanasamy. 642-654 [doi]
- VIP: virtualizing IP chains on handheld platformsNachiappan Chidambaram Nachiappan, Haibo Zhang, Jihyun Ryoo, Niranjan Soundararajan, Anand Sivasubramaniam, Mahmut T. Kandemir, Ravishankar Iyer, Chita R. Das. 655-667 [doi]
- FaultHound: value-locality-based soft-fault toleranceNitin, Irith Pomeranz, T. N. Vijaykumar. 668-681 [doi]
- COP: to compress and protect main memoryDavid J. Palframan, Nam Sung Kim, Mikko H. Lipasti. 682-693 [doi]
- Hi-fi playback: tolerating position errors in shift operations of racetrack memoryChao Zhang, Guangyu Sun, Xian Zhang, Weiqi Zhang, Weisheng Zhao, Tao Wang, Yun Liang, Yongpan Liu, Yu Wang, Jiwu Shu. 694-706 [doi]
- Stash: have your scratchpad and cache it tooRakesh Komuravelli, Matthew D. Sinclair, Johnathan Alsop, Muhammad Huzaifa, Maria Kotsifakou, Prakalp Srivastava, Sarita V. Adve, Vikram S. Adve. 707-719 [doi]
- Coherence protocol for transparent management of scratchpad memories in shared memory manycore architecturesLluc Alvarez, Lluís Vilanova, Miquel Moretó, Marc Casas, Marc González, Xavier Martorell, Nacho Navarro, Eduard Ayguadé, Mateo Valero. 720-732 [doi]
- Fusion: design tradeoffs in coherent cache hierarchies for acceleratorsSnehasish Kumar, Arrvindh Shriraman, Naveen Vedula. 733-745 [doi]