Abstract is missing.
- Sparse-TPU: adapting systolic arrays for sparse matricesXin He, Subhankar Pal, Aporva Amarnath, Siying Feng, Dong-Hyeon Park, Austin Rovinski, Haojie Ye, Kuan-Yu Chen, Ronald G. Dreslinski, Trevor N. Mudge. [doi]
- Fast, accurate, and scalable memory modeling of GPGPUs using reuse profilesYehia Arafa, Abdel-Hameed A. Badawy, Gopinath Chennupati, Atanu Barai, Nandakishore Santhi, Stephan J. Eidenbenz. [doi]
- Wavefront parallelization of recurrent neural networks on multi-core architecturesRobin Kumar Sharma, Marc Casas. [doi]
- RICH: implementing reductions in the cache hierarchyVladimir Dimic, Miquel Moretó, Marc Casas, Jan Ciesko, Mateo Valero. [doi]
- Ouroboros: virtualized queues for dynamic memory management on GPUsMartin Winter, Daniel Mlakar, Mathias Parger, Markus Steinberger. [doi]
- MKPipe: a compiler framework for optimizing multi-kernel workloads in OpenCL for FPGAJi Liu, Abdullah-Al Kafi, Xipeng Shen, Huiyang Zhou. [doi]
- Fast distributed bandits for online recommendation systemsKanak Mahadik, Qingyun Wu, Shuai Li, Amit Sabne. [doi]
- Snug: architectural support for relaxed concurrent priority queueing in chip multiprocessorsAzin Heidarshenas, Tanmay Gangwani, Serif Yesil, Adam Morrison 0001, Josep Torrellas. [doi]
- Global link arrangement for practical DragonflyZaid Salamah A. Alzaid, Saptarshi Bhowmik, Xin Yuan 0001, Michael Lang 0003. [doi]
- Leveraging intra-page update diversity for mitigating write amplification in SSDsImran Fareed, Mincheol Kang, Wonyoung Lee 0001, Soontae Kim. [doi]
- AMOEBA: a coarse grained reconfigurable architecture for dynamic GPU scalingXianwei Cheng, Hui Zhao 0013, Mahmut T. Kandemir, Beilei Jiang, Gayatri Mehta. [doi]
- V-Combiner: speeding-up iterative graph processing on a shared-memory platform with vertex mergingAzin Heidarshenas, Serif Yesil, Dimitrios Skarlatos, Sasa Misailovic, Adam Morrison 0001, Josep Torrellas. [doi]
- Characterization and identification of HPC applications at leadership computing facilityZhengchun Liu, Ryan Lewis, Rajkumar Kettimuthu, Kevin Harms, Philip H. Carns, Nageswara S. V. Rao, Ian T. Foster, Michael E. Papka. [doi]
- Chunking loops with non-uniform workloadsIndu K. Prabhu, V. Krishna Nandivada. [doi]
- Parallelizing pruned landmark labeling: dealing with dependencies in graph algorithmsRuoming Jin, Zhen Peng, Wendell Wu, Feodor F. Dragan, Gagan Agrawal, Bin Ren. [doi]
- Post-moore server architectureBabak Falsafi. [doi]
- Compiler aided checkpointing using crash-consistent data structures in NVMM systemsTyler Coy, Shuibing He, Bin Ren, Xuechen Zhang. [doi]
- Mapping and scheduling HPC applications for optimizing I/OJesús Carretero 0001, Emmanuel Jeannot, Guillaume Pallez, David E. Singh, Nicolas Vidal. [doi]
- CFDNet: a deep learning-based accelerator for fluid simulationsOctavi Obiols-Sales, Abhinav Vishnu, Nicholas Malaya, Aparna Chandramowliswharan. [doi]
- End-to-end performance modeling of distributed GPU applicationsJaemin Choi, David F. Richards, Laxmikant V. Kalé, Abhinav Bhatele. [doi]
- Accelerating relax-ordered task-parallel workloads using multi-level dependency checkingMasab Ahmad, Mohsin Shan, Akif Rehman, Omer Khan. [doi]
- A scalable framework for solving fractional diffusion equationsMax Carlson, Robert M. Kirby, Hari Sundar. [doi]
- SB-Fetch: synchronization aware hardware prefetching for chip multiprocessorsLaith M. AlBarakat, Paul V. Gratz, Daniel A. Jiménez. [doi]
- BurstZ: a bandwidth-efficient scientific computing accelerator platform for large-scale dataGongjin Sun, Seongyoung Kang, Sang-Woo Jun. [doi]
- Bundlefly: a low-diameter topology for multicore fiberFei Lei, Dezun Dong, Xiangke Liao, José Duato. [doi]
- TensorSVM: accelerating kernel machines with tensor engineShaoshuai Zhang, Ruchi Shah, Panruo Wu. [doi]
- A coordinate-oblivious index for high-dimensional distance similarity searches on the GPUBrian Donnelly, Michael Gowanlock. [doi]
- Graptor: efficient pull and push style vectorized graph processingHans Vandierendonck. [doi]
- What every scientific programmer should know about compiler optimizations?Jialiang Tan, Shuyin Jiao, Milind Chabbi, Xu Liu 0001. [doi]
- CSB-RNN: a faster-than-realtime RNN acceleration framework with compressed structured blocksRunbin Shi, Peiyan Dong, Tong Geng, Yuhao Ding, Xiaolong Ma, Hayden Kwok-Hay So, Martin C. Herbordt, Ang Li, Yanzhi Wang. [doi]
- Fuzzy fairness controller for NVMe SSDsShivani Tripathy, Debiprasanna Sahoo, Manoranjan Satpathy, Madhu Mutyam. [doi]
- Tools for top-down performance analysis of GPU-accelerated applicationsKeren Zhou, Mark W. Krentel, John M. Mellor-Crummey. [doi]
- Identifying and (automatically) remedying performance problems in CPU/GPU applicationsBenjamin Welton, Barton P. Miller. [doi]
- How I learned to stop worrying about user-visible endpoints and love MPIRohit Zambre, Aparna Chandramowliswharan, Pavan Balaji. [doi]
- Optimizing supercompilers for supercomputersMichael Wolfe. [doi]
- Modeling and optimizing NUMA effects and prefetching with machine learningIsaac Sánchez Barrera, David Black-Schaffer, Marc Casas, Miquel Moretó, Anastasiia Stupnikova, Mihail Popov. [doi]
- Tuning applications for efficient GPU offloading to in-memory processingYudong Wu, Mingyao Shen, Yi-Hui Chen, Yuanyuan Zhou. [doi]
- AutoParBench: a unified test framework for OpenMP-based parallelizersGleison Souza Diniz Mendonca, Chunhua Liao, Fernando Magno Quintão Pereira. [doi]
- Efficient parallel algorithms for betweenness- and closeness-centrality in dynamic graphsKshitij Shukla, Sai Charan Regunta, Sai Harsh Tondomker, Kishore Kothapalli. [doi]
- NV-group: link-efficient reduction for distributed deep learning on modern dense GPU systemsChing-Hsiang Chu, Pouya Kousha, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni, Dhabaleswar K. D. K. Panda. [doi]
- CodeSeer: input-dependent code variants selection via machine learningTao Wang, Nikhil Jain, David Böhme, David Beckingsale, Frank Mueller, Todd Gamblin. [doi]
- cuRipples: influence maximization on multi-GPU systemsMarco Minutoli, Maurizio Drocco, Mahantesh Halappanavar, Antonino Tumeo, Ananth Kalyanaraman. [doi]