2016 IEEE International Symposium on High Performance Computer Architecture, HPCA 2016, Barcelona, Spain, March 12-16, 2016 - researchr publication

researchr

You are not signed in
Sign in
Sign up

2016 IEEE International Symposium on High Performance Computer Architecture, HPCA 2016, Barcelona, Spain, March 12-16, 2016. IEEE Computer Society, 2016. [doi]

Conference: hpca2016

Abstract is missing.

Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learningMahdi Nazm Bojnordi, Engin Ipek. 1-13 [doi]

TABLA: A unified template-based framework for accelerating statistical machine learningDivya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kyung Kim, Hadi Esmaeilzadeh. 14-26 [doi]

Pushing the limits of accelerator efficiency while retaining programmabilityTony Nowatzki, Vinay Gangadhar, Karthikeyan Sankaralingam, Greg Wright. 27-39 [doi]

A low power software-defined-radio baseband processor for the Internet of ThingsYajing Chen, Shengshuo Lu, Hun-Seok Kim, David Blaauw, Ronald G. Dreslinski, Trevor N. Mudge. 40-51 [doi]

Improving smartphone user experience by balancing performance and energy with probabilistic QoS guaranteeBenjamin Gaudette, Carole-Jean Wu, Sarma B. K. Vrudhula. 52-63 [doi]

Mobile CPU's rise to power: Quantifying the impact of generational mobile CPU design trends on performance, energy, and user satisfactionMatthew Halpern, Yuhao Zhu, Vijay Janapa Reddi. 64-76 [doi]

Atomic persistence for SCM with a non-intrusive backend controllerKshitij Doshi, Ellis Giles, Peter J. Varman. 77-89 [doi]

CompEx: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVMPoovaiah M. Palangappa, Kartik Mohanram. 90-101 [doi]

A low-power hybrid reconfigurable architecture for resistive random-access memoriesMiguel Angel Lastras-Montaño, Amirali Ghofrani, Kwang-Ting Cheng. 102-113 [doi]

A performance analysis framework for optimizing OpenCL applications on FPGAsZe-ke Wang, Bingsheng He, Wei Zhang, Shunning Jiang. 114-125 [doi]

HRL: Efficient and flexible reconfigurable logic for near-data processingMingyu Gao, Christos Kozyrakis. 126-137 [doi]

Software transparent dynamic binary translation for coarse-grain reconfigurable architecturesMatthew A. Watkins, Tony Nowatzki, Anthony Carno. 138-150 [doi]

Core tunneling: Variation-aware voltage noise mitigation in GPUsRenji Thomas, Kristin Barber, Naser Sedaghati, Li Zhou, Radu Teodorescu. 151-162 [doi]

Warped-preexecution: A GPU pre-execution approach for improving latency hidingSangpil Lee, Won Woo Ro, Keunsoo Kim, Gunjae Koo, Myung Kuk Yoon, Murali Annavaram. 163-175 [doi]

Approximating warps with intra-warp operand value similarityDaniel Wong 0001, Nam Sung Kim, Murali Annavaram. 176-187 [doi]

A case for toggle-aware compression for GPU systemsGennady Pekhimenko, Evgeny Bolotin, Nandita Vijaykumar, Onur Mutlu, Todd C. Mowry, Stephen W. Keckler. 188-200 [doi]

Minimal disturbance placement and promotionElvira Teran, Yingying Tian, Zhe Wang, Daniel A. Jiménez. 201-211 [doi]

Revisiting virtual L1 caches: A practical design using dynamic synonym remappingHongil Yoon, Gurindar S. Sohi. 212-224 [doi]

Modeling cache performance beyond LRUNathan Beckmann, Daniel Sanchez. 225-236 [doi]

Efficient footprint caching for Tagless DRAM CachesHakbeom Jang, Yongjun Lee, JongWon Kim, Youngsok Kim, Jangwoo Kim, Jinkyu Jeong, Jae W. Lee. 237-248 [doi]

SCsafe: Logging sequential consistency violations continuously and preciselyYuelu Duan, David Koufaty, Josep Torrellas. 249-260 [doi]

LASER: Light, Accurate Sharing dEtection and RepairLiang Luo, Akshitha Sriraman, Brooke Fugate, Shiliang Hu, Gilles Pokam, Chris J. Newburn, Joseph Devietti. 261-273 [doi]

Efficient GPU hardware transactional memory through early conflict resolutionSui Chen, Lu Peng. 274-284 [doi]

PleaseTM: Enabling transaction conflict management in requester-wins hardware transactional memorySunjae Park, Milos Prvulovic, Christopher J. Hughes. 285-296 [doi]

Efficient synthetic traffic models for large, complex SoCsJieming Yin, Onur Kayiran, Matthew Poremba, Natalie D. Enright Jerger, Gabriel H. Loh. 297-308 [doi]

DVFS for NoCs in CMPs: A thread voting approachYuan Yao, Zhonghai Lu. 309-320 [doi]

SLaC: Stage laser control for a flattened butterfly networkYigit Demir, Nikos Hardavellas. 321-332 [doi]

The runahead network-on-chipZimo Li, Joshua San Miguel, Natalie D. Enright Jerger. 333-344 [doi]

Towards high performance paged memory for GPUsTianhao Zheng, David W. Nellans, Arslan Zulfiqar, Mark Stephenson, Stephen W. Keckler. 345-357 [doi]

Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharingZhenning Wang, Jun Yang, Rami G. Melhem, Bruce R. Childers, Youtao Zhang, Minyi Guo. 358-369 [doi]

iPAWS: Instruction-issue pattern-based adaptive warp scheduling for GPGPUsMinseok Lee, Gwangsun Kim, John Kim, Woong Seo, Yeon Gon Cho, Soojung Ryu. 370-381 [doi]

Lattice priority scheduling: Low-overhead timing-channel protection for a shared memory controllerAndrew Ferraiuolo, Yao Wang, Danfeng Zhang, Andrew C. Myers, G. Edward Suh. 382-393 [doi]

A complete key recovery timing attack on a GPUZhen Hang Jiang, Yunsi Fei, David R. Kaeli. 394-405 [doi]

CATalyst: Defeating last-level cache side channel attacks in cloud computingFangfei Liu, Qian Ge, Yuval Yarom, Frank McKeen, Carlos V. Rozas, Gernot Heiser, Ruby B. Lee. 406-418 [doi]

Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machinesWei Wang, Jack W. Davidson, Mary Lou Soffa. 419-431 [doi]

A market approach for handling power emergencies in multi-tenant data centerMohammad A. Islam, Xiaoqi Ren, Shaolei Ren, Adam Wierman, Xiaorui Wang. 432-443 [doi]

SizeCap: Efficiently handling power surges in fuel cell powered data centersYang Li, Di Wang, Saugata Ghose, Jie Liu, Sriram Govindan, Sean James, Eric Peterson, John Siegler, Rachata Ausavarungnirun, Onur Mutlu. 444-456 [doi]

MaPU: A novel mathematical computing architectureDonglin Wang, Xueliang Du, Leizu Yin, Chen Lin, Hong Ma, Weili Ren, Huijuan Wang, Xingang Wang, Shaolin Xie, Lei Wang, Zijun Liu, Tao Wang, Zhonghua Pu, Guangxin Ding, Mengchen Zhu, Lipeng Yang, Ruoshan Guo, Zhiwei Zhang, Xiao Lin, Jie Hao, Yongyong Yang, Wenqin Sun, Fabiao Zhou, NuoZhou Xiao, Qian Cui, Xiaoqin Wang. 457-468 [doi]

Best-offset hardware prefetchingPierre Michaud. 469-480 [doi]

DUANG: Fast and lightweight page migration in asymmetric memory systemsHao Wang, Jie Zhang, Sharmila Shridhar, Gieseo Park, Myoungsoo Jung, Nam Sung Kim. 481-493 [doi]

Selective GPU caches to eliminate CPU-GPU HW cache coherenceNeha Agarwal, David W. Nellans, Eiman Ebrahimi, Thomas F. Wenisch, John Danskin, Stephen W. Keckler. 494-506 [doi]

Venice: Exploring server architectures for effective resource sharingJianbo Dong, Rui Hou, Michael C. Huang, Tao Jiang 0010, Boyan Zhao, Sally A. McKee, Haibin Wang, Xiaosong Cui, Lixin Zhang. 507-518 [doi]

A large-scale study of soft-errors on GPUs in the fieldBin Nie, Devesh Tiwari, Saurabh Gupta, Evgenia Smirni, James H. Rogers. 519-530 [doi]

Design and implementation of a mobile storage leveraging the DRAM interfaceSungyong Seo, Youngjin Cho, Youngkwang Yoo, Otae Bae, Jaegeun Park, Heehyun Nam, Sunmi Lee, Yongmyung Lee, Seungdo Chae, MoonSang Kwon, Jin Hyeok Choi, Sangyeun Cho, Jaeheon Jeong, Duckhyun Chang. 531-542 [doi]

Restore truncation for performance improvement in future DRAM systemsXianWei Zhang, Youtao Zhang, Bruce R. Childers, Jun Yang. 543-554 [doi]

Parity Helix: Efficient protection for single-dimensional faults in multi-dimensional memory systemsXun Jian, Vilas Sridharan, Rakesh Kumar 0002. 555-567 [doi]

Low-Cost Inter-Linked Subarrays (LISA): Enabling fast inter-subarray data movement in DRAMKevin K. Chang, Prashant J. Nair, Donghyuk Lee, Saugata Ghose, Moinuddin K. Qureshi, Onur Mutlu. 568-580 [doi]

ChargeCache: Reducing DRAM latency by exploiting row access localityHasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, Onur Mutlu. 581-593 [doi]

Amdahl's law for lifetime reliability scaling in heterogeneous multicore processorsWilliam J. Song, Saibal Mukhopadhyay, Sudhakar Yalamanchili. 594-605 [doi]

LiveSim: Going live with microarchitecture simulationSina Hassani, Gabriel Southern, Jose Renau. 606-617 [doi]

McVerSi: A test generation framework for fast memory consistency verification in simulationVijay Nagarajan, Marco Elver. 618-630 [doi]

Energy-efficient address translationVasileios Karakostas, Jayneel Gandhi, Adrián Cristal, Mark D. Hill, Kathryn S. McKinley, Mario Nemirovsky, Michael M. Swift, Osman S. Unsal. 631-643 [doi]

RADAR: Runtime-assisted dead region management for last-level cachesMadhavan Manivannan, Vassilis Papaefstathiou, Miquel Pericàs, Per Stenström. 644-656 [doi]

Cache QoS: From concept to reality in the Intel® Xeon® processor E5-2600 v3 product familyAndrew Herdrich, Edwin Verplanke, Priya Autee, Ramesh Illikkal, Chris Gianos, Ronak Singhal, Ravi Iyer. 657-668 [doi]

Symbiotic job scheduling on the IBM POWER8Josué Feliu, Stijn Eyerman, Julio Sahuquillo, Salvador Petit. 669-680 [doi]

ScalCore: Designing a core for voltage scalabilityBhargava Gopireddy, Choungki Song, Josep Torrellas, Nam Sung Kim, Aditya Agrawal, Asit K. Mishra. 681-693 [doi]

Cost effective physical register sharingArthur Perais, André Seznec. 694-706 [doi]

runs on WebDSL