Abstract is missing.
- Simplification and runtime resolution of data dependence constraints for loop transformationsDiogo Nunes Sampaio, Louis-Noël Pouchet, Fabrice Rastello. [doi]
- Optimizing recursive task parallel programsSuyash Gupta, Rahul Shrivastava, V. Krishna Nandivada. [doi]
- Packet coalescing exploiting data redundancy in GPGPU architecturesKyung-Hoon Kim, Rahul Boyapati, Jiayi Huang, Yuho Jin, Ki Hwan Yum, Eun Jung Kim 0001. [doi]
- Way-combining directory: an adaptive and scalable low-cost coherence directoryJ. Rubén Titos Gil, Antonio Flores, Ricardo Fernández Pascual, Alberto Ros, Manuel E. Acacio. [doi]
- Novel HPC techniques to batch execution of many variable size BLAS computations on GPUsAhmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack Dongarra. [doi]
- GraphGrind: addressing load imbalance of graph partitioningJiawen Sun, Hans Vandierendonck, Dimitrios S. Nikolopoulos. [doi]
- Globally homogeneous, locally adaptive sparse matrix-vector multiplication on the GPUMarkus Steinberger, Rhaleb Zayer, Hans-Peter Seidel. [doi]
- HiPA: history-based piecewise approximation for functionsAurangzeb, Rudolf Eigenmann. [doi]
- Compile-time optimized and statically scheduled N-D convnet primitives for multi-core and many-core (Xeon Phi) CPUsAleksandar Zlateski, H. Sebastian Seung. [doi]
- A performance analysis framework for exploiting GPU microarchitectural capabilityKe-ren Zhou, Guangming Tan, Xiuxia Zhang, Chaowei Wang, Ninghui Sun. [doi]
- Iteration-fusing conjugate gradientSicong Zhuang, Marc Casas. [doi]
- Supporting automatic recovery in offloaded distributed programming models through MPI-3 techniquesAntonio J. Peña, Vicenç Beltran, Carsten Clauss, Thomas Moschny. [doi]
- Design and implementation of bandwidth-aware memory placement and migration policies for heterogeneous memory systemsSeongdae Yu, Seongbeom Park, Woongki Baek. [doi]
- Frequent subtree mining on the automata processor: challenges and opportunitiesElaheh Sadredini, Reza Rahimi, Ke Wang, Kevin Skadron. [doi]
- HPAT: high performance analytics with scripting ease-of-useEhsan Totoni, Todd A. Anderson, Tatiana Shpeisman. [doi]
- libPRISM: an intelligent adaptation of prefetch and SMT levelsCristobal Ortega, Miquel Moretó, Marc Casas, Ramon Bertran, Alper Buyuktosunoglu, Alexandre E. Eichenberger, Pradip Bose. [doi]
- SSDUP: a traffic-aware ssd burst buffer for HPC systemsXuanhua Shi, Ming Li, Wei Liu, Hai Jin, Chen Yu, Yong Chen. [doi]
- SPIRIT: a framework for creating distributed recursive tree applicationsNikhil Hegde, Jianqiao Liu, Milind Kulkarni. [doi]
- Fast segmented sort on GPUsKaixi Hou, Weifeng Liu 0002, Hao Wang 0002, Wu-chun Feng. [doi]
- Carpool: a bufferless on-chip network supporting adaptive multicast and hotspot alleviationXi-Yue Xiang, Wentao Shi, Saugata Ghose, Lu Peng, Onur Mutlu, Nian-Feng Tzeng. [doi]
- Efficient SIMD and MIMD parallelization of hash-based aggregation by conflict mitigationPeng Jiang, Gagan Agrawal. [doi]
- Demystifying automata processing: GPUs, FPGAs or Micron's AP?Marziyeh Nourian, Xiang Wang, Xiaodong Yu, Wu-chun Feng, Michela Becchi. [doi]
- Enabling scalability-sensitive speculative parallelization for FSM computationsJunqiao Qiu, Zhijia Zhao, Bo Wu, Abhinav Vishnu, Shuaiwen Leon Song. [doi]
- Hardware/software cooperative caching for hybrid DRAM/NVM memory architecturesHaikun Liu, Yujie Chen, Xiaofei Liao, Hai Jin, Bingsheng He, Long Zheng, Rentong Guo. [doi]
- Automatic topology mapping of diverse large-scale parallel applicationsJuan J. Galvez, Nikhil Jain, Laxmikant V. Kalé. [doi]
- On improving performance of sparse matrix-matrix multiplication on GPUsRakshith Kunchum, Ankur Chaudhry, Aravind Sukumaran-Rajam, Qingpeng Niu, Israt Nisa, P. Sadayappan. [doi]
- Dynamic scheduling for efficient hierarchical sparse matrix operations on the GPUAndreas Derler, Rhaleb Zayer, Hans-Peter Seidel, Markus Steinberger. [doi]
- Revisiting phased transactional memoryJoao P. L. de Carvalho, Guido Araujo, Alexandro Baldassin. [doi]