Abstract is missing.
- Massively parallel skyline computation for processing-in-memory architecturesVasileios Zois, Divya Gupta, Vassilis J. Tsotras, Walid A. Najjar, Jean-François Roy. [doi]
- Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesignTony Nowatzki, Newsha Ardalani, Karthikeyan Sankaralingam, Jian Weng. [doi]
- In-DRAM near-data approximate acceleration for GPUsAmir Yazdanbakhsh, Choungki Song, Jacob Sacks, Pejman Lotfi-Kamran, Hadi Esmaeilzadeh, Nam Sung Kim. [doi]
- Towards concurrency race debugging: an integrated approach for constraint solving and dynamic slicingLong Zheng 0003, Xiaofei Liao, Hai Jin 0001, Bingsheng He, Jingling Xue, Haikun Liu. [doi]
- Near-side prefetch throttling: adaptive prefetching for high-performance many-core processorsWim Heirman, Kristof Du Bois, Yves Vandriessche, Stijn Eyerman, Ibrahim Hur. [doi]
- Log(graph): a near-optimal high-performance graph representationMaciej Besta, Dimitri Stanojevic, Tijana Zivic, Jagpreet Singh, Maurice Hoerold, Torsten Hoefler. [doi]
- EAR: ECC-aided refresh reduction through 2-D zero compressionJeongkyu Hong, Hyeonggyu Kim, Soontae Kim. [doi]
- Hypart: a hybrid technique for practical memory bandwidth partitioning on commodity serversJinsu Park, Seongbeom Park, Myeonggyun Han, Jihoon Hyun, Woongki Baek. [doi]
- Maximizing system utilization via parallelism management for co-located parallel applicationsYounghyun Cho, Camilo A. Celis Guzman, Bernhard Egger. [doi]
- Automatic annotation of tasks in structured codePedro Ramos, Gleison Souza Diniz Mendonca, Divino Soares, Guido Araújo, Fernando Magno Quintão Pereira. [doi]
- Transactional pre-abort handlers in hardware transactional memorySunjae Park, Christopher J. Hughes, Milos Prvulovic. [doi]
- 3D-Xpath: high-density managed DRAM architecture with cost-effective alternative paths for memory transactionsSukhan Lee, Kiwon Lee, Min Chul Sung, Mohammad Alian, Chankyung Kim, Wooyeong Cho, Reum Oh, Seongil O, Jung Ho Ahn, Nam Sung Kim. [doi]
- Biased reference counting: minimizing atomic operations in garbage collectionJiho Choi, Thomas Shull, Josep Torrellas. [doi]
- E-PUR: an energy-efficient processing unit for recurrent neural networksFranyell Silfa, Gem Dot, Jose-Maria Arnau, Antonio González 0001. [doi]
- Cost effective speculation with the omnipredictorArthur Perais, André Seznec. [doi]
- Optimizing remote data transfers in X10Arun Thangamani, V. Krishna Nandivada. [doi]
- Compiler assisted coalescingSooraj Puthoor, Mikko H. Lipasti. [doi]
- Atributed consistent hashing for heterogeneous storage systemsJiang Zhou, Yong Chen 0001, Weiping Wang. [doi]
- A portable, automatic data qantizer for deep neural networksYoung H. Oh, Quan Quan, Daeyeon Kim, Seonghak Kim, Jun Heo, Sungjun Jung, Jaeyoung Jang, Jae W. Lee. [doi]
- Data motifs: a lens towards fully understanding big data and AI workloadsWanling Gao, Jianfeng Zhan, Lei Wang, Chunjie Luo, Daoyi Zheng, Fei Tang, Biwei Xie, Chen Zheng, Xu Wen, Xiwen He, Hainan Ye, Rui Ren. [doi]
- DART: distributed adaptive radix tree for efficient affix-based keyword search on HPC systemsWei Zhang 0097, Houjun Tang, Suren Byna, Yong Chen 0001. [doi]
- On-the-fly workload partitioning for integrated CPU/GPU architecturesYounghyun Cho, Florian Negele, Seohong Park, Bernhard Egger, Thomas R. Gross. [doi]
- Cimple: instruction and memory level parallelism: a DSL for uncovering ILP and MLPVladimir Kiriansky, Haoran Xu, Martin Rinard, Saman P. Amarasinghe. [doi]
- Cost-driven thread coarsening for GPU kernelsPrithayan Barua, Jun Shirako, Vivek Sarkar. [doi]
- Performance extraction and suitability analysis of multi- and many-core architectures for next generation sequencing secondary analysisSanchit Misra, Tony C. Pan, Kanak Mahadik, George Powley, Priya N. Vaidya, Md. Vasimuddin, Srinivas Aluru. [doi]
- MemoDyn: exploiting weakly consistent data structures for dynamic parallel memoizationPrakash Prabhu, Stephen R. Beard, Sotiris Apostolakis, Ayal Zaks, David I. August. [doi]
- ComP-net: command processor networking for efficient intra-kernel communications on GPUsMichael LeBeane, Khaled Hamidouche, Brad Benton, Mauricio Breternitz, Steven K. Reinhardt, Lizy K. John. [doi]
- Graphphi: efficient parallel graph processing on emerging throughput-oriented architecturesZhen Peng, Alexander Powell, Bo Wu, Tekin Bicer, Bin Ren. [doi]
- Architectural support for convolutional neural networks on modern CPUsAnimesh Jain, Michael A. Laurenzano, Gilles A. Pokam, Jason Mars, Lingjia Tang. [doi]
- Revealing parallel scans and reductions in recurrences through function reconstructionPeng Jiang, Linchuan Chen, Gagan Agrawal. [doi]
- Synergistic cache layout for reuse and compressionBiswabandan Panda, André Seznec. [doi]
- Stencil codes on a vector length agnostic architectureAdrià Armejach, Helena Caminal, Juan M. Cebrian, Rekai González-Alberquilla, Chris Adeniyi-Jones, Mateo Valero, Marc Casas, Miquel Moretó. [doi]
- Mage: online and interference-aware scheduling for multi-scale heterogeneous systemsFrancisco Romero, Christina Delimitrou. [doi]
- VW-SLP: auto-vectorization with adaptive vector widthVasileios Porpodas, Rodrigo C. O. Rocha, Luís F. W. Góes. [doi]
- An efficient graph accelerator with parallel data conflict managementPengcheng Yao, Long Zheng 0003, Xiaofei Liao, Hai Jin 0001, Bingsheng He. [doi]
- GMOD: a dynamic GPU memory overflow detectorBang Di, Jianhua Sun, Dong Li, Hao Chen, Zhe Quan. [doi]