Abstract is missing.
- Preparing HPC Applications for the Exascale Era: A Decoupling StrategyIvy Bo Peng, Roberto Gioiosa, Gokcen Kestor, Erwin Laure, Stefano Markidis. 1-10 [doi]
- An Efficient, Distributed Stochastic Gradient Descent Algorithm for Deep-Learning ApplicationsGuojing Cong, Onkar Bhardwaj, Minwei Feng. 11-20 [doi]
- Large-Scale Parallelization of Smoothed Particle Hydrodynamics Method on Heterogeneous ClusterYingrui Wang, Leisheng Li, Rong Tian. 21-30 [doi]
- Boosting the Efficiency of HPCG and Graph500 with Near-Data ProcessingErik Vermij, Leandro Fiorin, Christoph Hagleitner, Koen Bertels. 31-40 [doi]
- GCN: GPU-Based Cube CNN Framework for Hyperspectral Image ClassificationHan Dong, Tao Li, Jiabing Leng, Lingyan Kong, Gang Bai. 41-49 [doi]
- Nearly Balanced Work Partitioning for Heterogeneous AlgorithmsMallipeddi Hardhik, Dip Sankar Banerjee, Kiran Raj Ramamoorthy, Kishore Kothapalli, Kannan Srinathan. 50-59 [doi]
- GLTO: On the Adequacy of Lightweight Thread Approaches for OpenMP ImplementationsAdrián Castelló, Sangmin Seo, Rafael Mayo, Pavan Balaji, Enrique S. Quintana-Ortí, Antonio J. Penã. 60-69 [doi]
- Locality-Aware Dynamic Task Graph SchedulingJordyn Maglalang, Sriram Krishnamoorthy, Kunal Agrawal. 70-80 [doi]
- Practical Experience with Transactional Lock ElisionTingzhe Zhou, Pantea Zardoshti, Michael F. Spear. 81-90 [doi]
- Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi PreconditioningHartwig Anzt, Jack J. Dongarra, Goran Flegar, Enrique S. Quintana-Ortí. 91-100 [doi]
- High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPUYusuke Nagasaka, Akira Nukada, Satoshi Matsuoka. 101-110 [doi]
- Constrained Tensor Factorization with Accelerated AO-ADMMShaden Smith, Alec Beri, George Karypis. 111-120 [doi]
- Efficient Data Sharing on Heterogeneous SystemsVictor Garcia-Flores, Eduard Ayguadé, Antonio J. Peña. 121-130 [doi]
- HyPPI NoC: Bringing Hybrid Plasmonics to an Opto-Electronic Network-on-ChipVikram K. Narayana, Shuai Sun, Armin Mehrabian, Volker J. Sorger, Tarek A. El-Ghazawi. 131-140 [doi]
- ES2: Aiming at an Optimal Virtual I/O Event PathXiaokang Hu, Wang Zhang, Jian Li, Ruhui Ma, Feng Wu, Haibing Guan. 141-150 [doi]
- MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow DecouplingAkshay Venkatesh, Khaled Hamidouche, Sreeram Potluri, Davide Rossetti, Ching-Hsiang Chu, Dhabaleswar K. Panda. 151-160 [doi]
- Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep LearningChing-Hsiang Chu, Xiaoyi Lu, Ammar Ahmad Awan, Hari Subramoni, Jahanzeb Maqbool Hashmi, Bracy Elton, Dhabaleswar K. Panda. 161-170 [doi]
- Overlapping Data Transfers with Computation on GPU with TilesBurak Bastem, Didem Unat, Weiqun Zhang, Ann S. Almgren, John Shalf. 171-180 [doi]
- Accelerating Graph Analytics by Utilising the Memory Locality of Graph PartitioningJiawen Sun, Hans Vandierendonck, Dimitrios S. Nikolopoulos. 181-190 [doi]
- Parallel Algorithms for the Computation of Cycles in Relative Neighborhood GraphsHari Sundar, Parmeshwar Khurd. 191-200 [doi]
- High Performance Query Processing for Web Scale RDF Data using BSP Style Communication and Balanced DistributionMinho Bae, Junho Eum, Donghoon Kim, Sangyoon Oh. 201-210 [doi]
- OptiMatch: Enabling an Optimal Match between Green Power and Various Workloads for Renewable-Energy Powered Storage SystemsXiaoyang Qu, Jiguang Wan, Fengguang Song, Xiaozhao Zhuang, Fei Wu, Changsheng Xie. 211-220 [doi]
- Favorable Block First: A Comprehensive Cache Scheme to Accelerate Partial Stripe Recovery of Triple Disk Failure Tolerant ArraysLuyu Li, Houxiang Ji, Chentao Wu, Jie Li, Minyi Guo. 221-230 [doi]
- Non-Sequential Striping for Distributed Storage Systems with Different Redundancy SchemesYanwen Xie, Dan Feng, Fang Wang. 231-240 [doi]
- Predicting Response Latency Percentiles for Cloud Object Storage SystemsYi Su, Dan Feng, Yu Hua, Zhan Shi. 241-250 [doi]
- WA-Dataspaces: Exploring the Data Staging Abstractions for Wide-Area Distributed Scientific WorkflowsMehmet Fatih Aktas, Javier Diaz Montes, Ivan Rodero, Manish Parashar. 251-260 [doi]
- Scalable Write Allocation in the WAFL File SystemMatthew Curtis-Maury, Ram Kesavan, Mrinal K. Bhattacharjee. 261-270 [doi]
- Parallel Construction of Simultaneous Deterministic Finite Automata on Shared-Memory MulticoresMinyoung Jung, Jinwoo Park, Johann Blieberger, Bernd Burgstaller. 271-281 [doi]
- Parallel Reconstruction of Three Dimensional Magnetohydrodynamic Equilibria in Plasma Confinement DevicesSudip K. Seal, Mark R. Cianciosa, Steven P. Hirshman, Andreas Wingen, Robert S. Wilcox, Ezekial A. Unterberg. 282-291 [doi]
- Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core ProcessorsAthena Elafrou, Georgios I. Goumas, Nectarios Koziris. 292-301 [doi]
- Network Aware Multi-User Computation Partitioning in Mobile Edge CloudsLei Yang, Jiannong Cao, Zhenyu Wang, Weigang Wu. 302-311 [doi]
- Fading-Resistant Link Scheduling in Wireless NetworksChenxi Qiu, Haiying Shen. 312-321 [doi]
- Order/Radix Problem: Towards Low End-to-End Latency Interconnection NetworksRyota Yasudo, Michihiro Koibuchi, Koji Nakano, Hiroki Matsutani, Hideharu Amano. 322-331 [doi]
- A Dynamic Resource Controller for a Lambda ArchitectureMohammadReza HosseinyFarahabady, Javid Taheri, Zahir Tari, Albert Y. Zomaya. 332-341 [doi]
- CELIA: Cost-Time Performance of Elastic Applications on CloudSunimal Rathnayake, Dumitrel Loghin, Yong Meng Teo. 342-351 [doi]
- The Cloud as an OpenMP Offloading DeviceHervé Yviquel, Guido Araujo. 352-361 [doi]
- Simple and Fast Parallel Algorithms for the Voronoi Map and the Euclidean Distance Map, with GPU ImplementationsTakumi Honda, Shinnosuke Yamamoto, Hiroaki Honda, Koji Nakano, Yasuaki Ito. 362-371 [doi]
- High-Performance Recommender System Training Using Co-Clustering on CPU/GPU ClustersKubilay Atasu, Thomas P. Parnell, Celestine Dünner, Michail Vlachos, Haralampos Pozidis. 372-381 [doi]
- Exploiting GPUs for Fast Force-Directed Visualization of Large-Scale NetworksGovert G. Brinkmann, Kristian F. D. Rietveld, Frank W. Takes. 382-391 [doi]
- A Coflow-Based Co-Optimization Framework for High-Performance Data AnalyticsLong Cheng, Ying Wang, Yulong Pei, Dick H. J. Epema. 392-401 [doi]
- PDS: An I/O-Efficient Scaling Scheme for Parity Declustered Data LayoutZhipeng Li, Yinlong Xu, Yongkun Li, Chengjin Tian, Youhui Bai. 402-411 [doi]
- Data Caching in Next Generation Mobile Cloud Services, Online vs. Off-LineYang Wang, Shuibing He, Xiaopeng Fan, Chengzhong Xu, Joseph Culberson, Joseph Horton. 412-421 [doi]
- Towards Highly Efficient DGEMM on the Emerging SW26010 Many-Core ProcessorLijuan Jiang, Chao Yang, Yulong Ao, Wanwang Yin, Wenjing Ma, Qiao Sun, Fangfang Liu, Rongfen Lin, Peng Zhang. 422-431 [doi]
- Optimizations of Two Compute-Bound Scientific Kernels on the SW26010 Many-Core ProcessorJames Lin, Zhigeng Xu, Akira Nukada, Naoya Maruyama, Satoshi Matsuoka. 432-441 [doi]
- Bitslice Vectors: A Software Approach to Customizable Data Precision on Processors with SIMD ExtensionsShixiong Xu, David Gregg. 442-451 [doi]
- Runtime Data Layout Scheduling for Machine Learning DatasetYang You, James Demmel. 452-461 [doi]
- A Machine Learning Approach for Efficient Parallel Simulation of Beam Dynamics on GPUsKamesh Arumugam, Desh Ranjan, Mohammad Zubair, Balsa Terzic, Alexander Godunov, Tunazzina Islam. 462-471 [doi]
- Multiple Pattern Matching for Network Security Applications: Acceleration through VectorizationCharalampos Stylianopoulos, Magnus Almgren, Olaf Landsiedel, Marina Papatriantafilou. 472-482 [doi]
- Parallel Space-Time Kernel Density EstimationErik Saule, Dinesh Panchananam, Alexander Hohl, Wenwu Tang, Eric Delmelle. 483-492 [doi]
- Parallel Algorithm for Single-Source Earliest-Arrival Problem in Temporal GraphsPeng Ni, Masatoshi Hanai, Wen Jun Tan, Chen Wang, Wentong Cai. 493-502 [doi]
- Greed Is Good: Parallel Algorithms for Bipartite-Graph Partial Coloring on Multicore ArchitecturesMustafa Kemal Tas, Kamer Kaya, Erik Saule. 503-512 [doi]
- A Scalable Hierarchical Semi-Separable Library for Heterogeneous ClustersIsuru Dilanka Fernando, Sanath Jayasena, Milinda Fernando, Hari Sundar. 513-522 [doi]
- Autotuning GPU Kernels via Static and Predictive AnalysisRobert V. Lim, Boyana Norris, Allen D. Malony. 523-532 [doi]
- A Pareto Framework for Data Analytics on Heterogeneous Systems: Implications for Green Energy Usage and PerformanceAniket Chakrabarti, Srinivasan Parthasarathy 0001, Christopher Stewart. 533-542 [doi]
- Scheduling Independent Tasks in Parallel under Power ConstraintsAyham Kassab, Jean-Marc Nicod, Laurent Philippe, Veronika Rehn-Sonigo. 543-552 [doi]
- A Novel Minimum Time Parallel 2-D Discrete Wavelet Transform Algorithm for General Purpose ProcessorsEduardo Moscoso Rubino, Alberto Jose Alvares, Raul Marin Prades, Pedro Sanz Valero. 553-562 [doi]
- A Parallel TSP-Based Algorithm for Balanced Graph PartitioningHarshvardhan Das, Subodh Kumar. 563-570 [doi]
- E-Storm: Replication-Based State Management in Distributed Stream Processing SystemsXunyun Liu, Aaron Harwood, Shanika Karunasekera, Benjamin Rubinstein, Rajkumar Buyya. 571-580 [doi]
- Resilience for Stencil Computations with Latent ErrorsAiman Fang, Aurélien Cavelan, Yves Robert, Andrew A. Chien. 581-590 [doi]
- Application-Aware Power Coordination on Power Bounded NUMA Multicore SystemsRong Ge 0002, Pengfei Zou, Xizhou Feng. 591-600 [doi]