Journal: Parallel Computing

Volume 40, Issue 9

449 -- 453Joao Andrade, Gabriel Falcão Paiva Fernandes, Vítor Manuel Mendes da Silva. Optimized Fast Walsh-Hadamard Transform on GPUs for non-binary LDPC decoding
454 -- 470Ehsan Totoni, Michael T. Heath, Laxmikant V. Kalé. Structure-adaptive parallel solution of sparse triangular linear systems
471 -- 495Diego Arroyuelo, Carolina Bonacic, Veronica Gil Costa, Mauricio Marín, Gonzalo Navarro. Distributed text search using suffix arrays
496 -- 511Yingchong Situ, Chandra S. Martha, Matthew E. Louis, Zhiyuan Li, Ahmed H. Sameh, Gregory A. Blaisdell, Anastasios S. Lyrintzis. Petascale large eddy simulation of jet engine noise based on the truncated SPIKE algorithm
512 -- 513Lucas Mello Schnorr, Philippe Olivier Alexandre Navaux. Best of SBAC-PAD 2012
514 -- 525Luiz E. Ramos, Ricardo Bianchini. Robust performance in hybrid-memory cooperative caches
526 -- 535Joefon Jann, R. Sarma Burugula, Ching-Farn E. Wu, Kaoutar El Maghraoui. Towards an immortal operating system in virtual environments
536 -- 547Esteban Meneses, Osman Sarood, Laxmikant V. Kalé. Energy profile of rollback-recovery strategies in high performance computing
548 -- 558Teo Milanez, Sylvain Collange, Fernando Magno Quintão Pereira, Wagner Meira Jr., Renato Ferreira. Thread scheduling and memory coalescing for dynamic vectorization of SPMD workloads

Volume 40, Issue 8

345 -- 361María Botón-Fernández, Miguel A. Vega-Rodríguez, Francisco Prieto Castrillo. Self-adaptivity for grid applications. An Efficient Resources Selection model based on evolutionary computation algorithms
362 -- 373Chihiro Kodama, Masaaki Terai, Akira T. Noda, Yohei Yamada, Masaki Satoh, Tatsuya Seiki, Shin-ichi Iga, Hisashi Yashiro, Hirofumi Tomita, Kazuo Minami. Scalable rank-mapping algorithm for an icosahedral grid system on the massive parallel computer with a 3-D torus network
374 -- 393Angeles G. Navarro, Rafael Asenjo, Francisco Corbera, Antonio J. Dios, Emilio L. Zapata. A case study of different task implementations for multioutput stages in non-trivial parallel pipeline applications
394 -- 407J. Sánchez-Curto, P. Chamorro-Posada, G. S. McDonald. Efficient parallel implementation of the nonparaxial beam propagation method
408 -- 424Jie Chen 0007, Tom L. H. Li, Mihai Anitescu. A parallel linear solver for multilevel Toeplitz systems with possibly several right-hand sides
425 -- 447Roman Wyrzykowski, Lukasz Szustak, Krzysztof Rojek. Parallelization of 2D MPDATA EULAG algorithm on hybrid architectures with GPU accelerators

Volume 40, Issue 7

159 -- 160Costas Bekas, Ananth Grama, Yousef Saad, Olaf Schenk. Parallel matrix algorithms
161 -- 172Robert Andrew, Nicholas J. Dingle. Implementing QR factorization updating algorithms on GPUs
173 -- 185Yiannis Cotronis, Elias Konstantinidis, Maria A. Louka, Nikolaos M. Missirlis. A comparison of CPU and GPU implementations for solving the Convection Diffusion equation using the local Modified SOR method
186 -- 194Thomas Auckenthaler, Thomas Huckle, Roland Wittmann. A blocked QR-decomposition for the parallel symmetric eigenvalue problem
195 -- 212Hasan Metin Aktulga, Lin Lin, Christopher Haine, Esmond G. Ng, Chao Yang. Parallel eigenvalue calculation based on multiple shift-invert Lanczos and contour integral based spectral projection method
213 -- 223Marc Baboulin, Dulceneia Becker, George Bosilca, Anthony Danalis, Jack Dongarra. An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems
224 -- 238Pieter Ghysels, Wim Vanroose. Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm
239 -- 250Erhan Turan, Peter Arbenz. Large scale micro finite element analysis of 3D bone poroelasticity
251 -- 270Michele Martone. Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the Recursive Sparse Blocks format
271 -- 288Lars Karlsson, Bo Kågström, Eddie Wadbro. Fine-grained bulge-chasing kernels for strongly scalable parallel QR algorithms
289 -- 308Johannes Langguth, Ariful Azad, Mahantesh Halappanavar, Fredrik Manne. On parallel push-relabel based algorithms for bipartite maximum matching
309 -- 327Jesús Cámara, Javier Cuenca, Luis-Pedro García, Domingo Giménez. Auto-tuned nested parallelism: A way to reduce the execution time of scientific software in NUMA systems
328 -- 343Emanuel H. Rubensson, Elias Rudberg. Chunks and Tasks: A programming model for parallelization of dynamic algorithms

Volume 40, Issue 5-6

47 -- 58Urban Borstnik, Joost VandeVondele, Valéry Weber, Jürg Hutter. Sparse matrix multiplication: The distributed block-compressed sparse row library
59 -- 69Yuki Sugimoto, Fumihiko Ino, Kenichi Hagihara. Improving cache locality for GPU-based volume rendering
70 -- 85Ray-Bing Chen, Yaohung M. Tsai, Weichung Wang. Adaptive block size for dense QR factorization in hybrid CPU-GPU systems via statistical modeling
86 -- 99Michael J. Hallock, John E. Stone, Elijah Roberts, Corey Fry, Zaida Luthey-Schulten. Simulation of reaction diffusion processes over biologically relevant size and time scales using multi-GPU workstations
100 -- 112Ivan Teixido, Francesc Sebé, Josep Conde, Francesc Solsona. MPI-based implementation of an enhanced algorithm to solve the LPN problem in a memory-constrained environment
113 -- 128Alberto F. Martín, Ruymán Reyes, Rosa M. Badia, Enrique S. Quintana-Ortí. Leveraging task-parallelism in message-passing dense matrix factorizations using SMPSs
129 -- 139Jose Antonio Pascual, José Miguel-Alonso, Jose Antonio Lozano. Application-aware metrics for partition selection in cube-shaped topologies
140 -- 143Robert Hallberg, Alistair Adcroft. An order-invariant real-to-integer conversion sum
144 -- 158Oscar Peredo, Julián M. Ortiz, José R. Herrero, Cristobal Samaniego. Tuning and hybrid parallelization of a genetic-based multi-point statistics simulation code

Volume 40, Issue 3-4

1 -- 33Mohammad Reza Selim, Mohammed Ziaur Rahman. Carrying on the legacy of imperative languages in the future parallel computing era
34 -- 46Jean-Yves L'Excellent, Wissam M. Sid-Lakhdar. A study of shared-memory parallelism in a multifrontal solver

Volume 40, Issue 2

33 -- 34Pavan Balaji, Zhiyi Huang 0001. Special issue on programming models and applications for multicores and manycores - Guest Editors' Introduction
35 -- 50Mark Utting, Min-Hsien Weng, John G. Cleary. The JStar language philosophy
51 -- 68Weihua Sheng, Stefan Schürmans, Maximilian Odendahl, Mark Bertsch, Vitaliy Volevach, Rainer Leupers, Gerd Ascheid. A compiler infrastructure for embedded heterogeneous MPSoCs
69 -- 89Vikas, Nasser Giacaman, Oliver Sinnen. Multiprocessing with GUI-awareness using OpenMP-like directives in Java
90 -- 106Oded Green, Yitzhak Birk. Scheduling directives: Accelerating shared-memory many-core processor execution
107 -- 115Zhenning Wang, Long Zheng, Quan Chen, Minyi Guo. CPU + GPU scheduling with asymptotic profiling
116 -- 135Yu Liu, Kento Emoto, Zhenjiang Hu. A Generate-Test-Aggregate parallel programming library for systematic parallel programming
136 -- 156Zhijun Hao, Chenning Xie, Haibo Chen, Binyu Zang. X10-FT: Transparent fault tolerance for APGAS language and runtime

Volume 40, Issue 10

559 -- 573Li Tan, Shashank Kothapalli, Longxiang Chen, Omar Hussaini, Ryan Bissiri, Zizhong Chen. A survey of power and energy efficient techniques for high performance numerical linear algebra operations
574 -- 588Antonio J. Peña, Carlos Reaño, Federico Silla, Rafael Mayo, Enrique S. Quintana-Ortí, José Duato. A complete and efficient CUDA-sharing solution for HPC clusters
589 -- 610George Teodoro, Tony Pan, Tahsin M. Kurç, Jun Kong, Lee A. D. Cooper, Scott Klasky, Joel H. Saltz. Region templates: Data representation and management for high-throughput image analysis
611 -- 627Yizhuo Wang, Yang Zhang, Yan Su, Xiaojun Wang, Xu Chen, Weixing Ji, Feng Shi. An adaptive and hierarchical task scheduling scheme for multi-core clusters
628 -- 645Andrew White, Soo-Young Lee. Derivation of optimal input parameters for minimizing execution time of matrix-based computations on a GPU
646 -- 660Nicholas Horelik, Andrew R. Siegel, Benoit Forget, Kord Smith. Monte Carlo domain decomposition for robust nuclear reactor analysis
661 -- 680Leandro A. J. Marzulo, Tiago A. O. Alves, Felipe M. G. França, Vítor Santos Costa. Couillard: Parallel programming via coarse-grained Data-flow Compilation
681 -- 0Philip C. Roth, Yong Chen. Guest Editors' introduction to the special issue on "DISCS-2013"
682 -- 696Jesse Weaver, Vito Giovanni Castellana, Alessandro Morari, Antonino Tumeo, Sumit Purohit, Alan Chappell, David Haglin, Oreste Villa, Sutanay Choudhury, Karen Schuchardt, John Feo. Toward a data scalable solution for facilitating discovery of science resources
697 -- 709Jiangling Yin, Junyao Zhang, Jun Wang, Wu-chun Feng. SDAFT: A novel scalable data access framework for parallel BLAST
710 -- 721Yong Li, Dan Feng, Zhan Shi. Heterogeneous-aware cache partitioning: Improving the fairness of shared storage cache
722 -- 737Joong-Yeon Cho, Hyun-Wook Jin, Min Lee, Karsten Schwan. Dynamic core affinity for high-performance file upload on Hadoop Distributed File System
738 -- 753P. Coetzee, Matthew Leeke, Stephen A. Jarvis. Towards unified secure on- and off-line analytics at scale
754 -- 767Dominique Lasalle, George Karypis. MPI for Big Data: New tricks for an old dog
768 -- 785Lan Vu, Gita Alaghband. Novel parallel method for association rule mining on multi-core shared memory systems

Volume 40, Issue 1

1 -- 16Leonid Yavits, Amir Morad, Ran Ginosar. The effect of communication and synchronization on Amdahl's law in multicore systems
17 -- 31Lois Curfman McInnes, Barry Smith 0002, Hong Zhang, Richard Tran Mills. Hierarchical Krylov and nested Krylov methods for extreme-scale computing