Abstract is missing.
- Business meets supercomputing: keynote talkBob Blainey. 1-2 [doi]
- Abstractions to separate concerns in semi-regular gridsAndrew Stone, Michelle Mills Strout. 3-12 [doi]
- A stencil compiler for short-vector SIMD architecturesThomas Henretty, Richard Veras, Franz Franchetti, Louis-Noël Pouchet, J. Ramanujam, P. Sadayappan. 13-24 [doi]
- Exploiting domain knowledge to optimize parallel computational mechanics codesChenyang Liu, Muhammad Hasan Jamal, Milind Kulkarni, Arun Prakash, Vijay S. Pai. 25-36 [doi]
- TEAPOT: a toolset for evaluating performance, power and image quality on mobile graphics systemsJose-Maria Arnau, Joan-Manuel Parcerisa, Polychronis Xekalakis. 37-46 [doi]
- Scaling data race detection for partitioned global address space programsChang-Seo Park, Koushik Sen, Costin Iancu. 47-58 [doi]
- Elastic and scalable tracing and accurate replay of non-deterministic eventsXing Wu, Frank Mueller. 59-68 [doi]
- A new approach for performance analysis of openMP programsXu Liu, John M. Mellor-Crummey, Michael W. Fagan. 69-80 [doi]
- Conservative row activation to improve memory power efficiencyKun Fang, Zhichun Zhu. 81-90 [doi]
- Active disk meets flash: a case for intelligent SSDsSangyeun Cho, Chanik Park, Hyunok Oh, Sungchan Kim, Youngmin Yi, Gregory R. Ganger. 91-102 [doi]
- Design of a large-scale storage-class RRAM systemMyoungsoo Jung, John Shalf, Mahmut T. Kandemir. 103-114 [doi]
- Memorage: emerging persistent RAM based malleable main memory and storage architectureJu-Young Jung, Sangyeun Cho. 115-126 [doi]
- Function, latency, bandwidth, power: towards a better computerSteven L. Teig. 127-128 [doi]
- Improving communication in PGAS environments: static and dynamic coalescing in UPCMichail Alvanos, Montse Farreras, Ettore Tiotto, José Nelson Amaral, Xavier Martorell. 129-138 [doi]
- Bandwidth-optimal all-to-all exchanges in fat tree networksBogdan Prisacari, Germán Rodríguez, Cyriel Minkenberg, Torsten Hoefler. 139-148 [doi]
- An automatic input-sensitive approach for heterogeneous task partitioningKlaus Kofler, Ivan Grasso, Biagio Cosenza, Thomas Fahringer. 149-160 [doi]
- LibWater: heterogeneous distributed computing made easyIvan Grasso, Simone Pellegrini, Biagio Cosenza, Thomas Fahringer. 161-172 [doi]
- Exploring hardware overprovisioning in power-constrained, high performance computingTapasya Patki, David K. Lowenthal, Barry Rountree, Martin Schulz, Bronis R. de Supinski. 173-182 [doi]
- The power 775 architecture at scaleRamakrishnan Rajamony, Mark W. Stephenson, William Evan Speight. 183-192 [doi]
- Bubble coloring: avoiding routing- and protocol-induced deadlocks with minimal virtual channel requirementRuisheng Wang, Lizhong Chen, Timothy Mark Pinkston. 193-202 [doi]
- Evaluating on-die interconnects for a 4 TB/s routerKeith D. Underwood, Eric Borch, John Sizer, Timothy Stremcha, Michael Strom. 203-212 [doi]
- Improving numerical accuracy for non-negative matrix multiplication on GPUs using recursive algorithmsMatthew Badin, Paolo D'Alberto, Lubomir Bic, Michael B. Dillencourt, Alexandru Nicolau. 213-222 [doi]
- Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communicationAzzam Haidar, Mark Gates, Stanimire Tomov, Jack Dongarra. 223-232 [doi]
- High quality real-time image-to-mesh conversion for finite element simulationsPanagiotis A. Foteinos, Nikos Chrisochoides. 233-242 [doi]
- Tuning the continual flow pipeline architectureKomal Jothi, Haitham Akkary. 243-252 [doi]
- Towards more efficient execution: a decoupled access-execute approachKonstantinos Koukos, David Black-Schaffer, Vasileios Spiliopoulos, Stefanos Kaxiras. 253-262 [doi]
- Quantifying performance bottleneck cost through differential analysisSouad Koliai, Zakaria Bendifallah, Mathieu Tribalat, Cédric Valensi, Jean-Thomas Acquaviva, William Jalby. 263-272 [doi]
- Efficient sparse matrix-vector multiplication on x86-based many-core processorsXing Liu, Mikhail Smelyanskiy, Edmond Chow, Pradeep Dubey. 273-282 [doi]
- Expressing graph algorithms using generalized active messagesNicholas Gerard Edmonds, Jeremiah Willcock, Andrew Lumsdaine. 283-292 [doi]
- HykSort: a new variant of hypercube quicksort on distributed memory architecturesHari Sundar, Dhairya Malhotra, George Biros. 293-302 [doi]
- Diagnosis and optimization of application prefetching performanceGabriel Marin, Collin McCurdy, Jeffrey S. Vetter. 303-312 [doi]
- Address-aware fencesChanghui Lin, Vijay Nagarajan, Rajiv Gupta. 313-324 [doi]
- Prefetching and cache management using task lifetimesVassilis Papaefstathiou, Manolis Katevenis, Dimitrios S. Nikolopoulos, Dionisios N. Pnevmatikatos. 325-334 [doi]
- The role of computer designers in reverse-engineering the brainJames E. Smith. 335-336 [doi]
- Holistic run-time parallelism management for time and energy efficiencySrinath Sridharan, Gagan Gupta, Gurindar S. Sohi. 337-348 [doi]
- G-Charm: an adaptive runtime system for message-driven parallel applications on hybrid systemsR. Vasudevan, Sathish S. Vadhiyar, Laxmikant V. Kalé. 349-358 [doi]
- Implementing OmpSs support for regions of data in architectures with multiple address spacesJavier Bueno, Xavier Martorell, Rosa M. Badia, Eduard Ayguadé, Jesús Labarta. 359-368 [doi]
- Automatically adapting programs for mixed-precision floating-point computationMichael O. Lam, Jeffrey K. Hollingsworth, Bronis R. de Supinski, Matthew P. LeGendre. 369-378 [doi]
- CMP off-chip bandwidth scheduling guided by instruction criticalityPablo Prieto, Valentin Puente, José-Ángel Gregorio. 379-388 [doi]
- Massively parallel loadingWolfgang Frings, Dong H. Ahn, Matthew P. LeGendre, Todd Gamblin, Bronis R. de Supinski, Felix Wolf. 389-398 [doi]
- MIC-RO: enabling efficient remote offload on heterogeneous many integrated core (MIC) clusters with InfiniBandKhaled Hamidouche, Sreeram Potluri, Hari Subramoni, Krishna Chaitanya Kandalla, Dhabaleswar K. Panda. 399-408 [doi]
- Efficient scheduling of recursive control flow on GPUsXin Huo, Sriram Krishnamoorthy, Gagan Agrawal. 409-420 [doi]
- SemCache: semantics-aware caching for efficient GPU offloadingNabeel AlSaber, Milind Kulkarni. 421-432 [doi]
- Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancementPing Xiang, Yi Yang, Mike Mantor, Norm Rubin, Lisa R. Hsu, Huiyang Zhou. 433-442 [doi]
- Scaling large-data computations on multi-GPU acceleratorsAmit Sabne, Putt Sakdhnagool, Rudolf Eigenmann. 443-454 [doi]
- Hybrid approach for data-flow analysis of MPI programsSriram Aananthakrishnan, Greg Bronevetsky, Ganesh Gopalakrishnan. 455-456 [doi]
- Improving performance of all-to-all communication through loop scheduling in PGAS environmentsMichail Alvanos, Gabriel Tanase, Montse Farreras, Ettore Tiotto, José Nelson Amaral, Xavier Martorell. 457-458 [doi]
- CUPL: a compile-time uncoalesced memory access pattern locator for CUDAMadhur Amilkanthwar, Shankar Balachandran. 459-460 [doi]
- Imbalance optimization in scientific workflowsWeiwei Chen, Ewa Deelman, Rizos Sakellariou. 461-462 [doi]
- FASTER run-time reconfiguration managementCatalin Bogdan Ciobanu, Dionisios N. Pnevmatikatos, Kyprianos D. Papadimitriou, Georgi Nedeltchev Gaydadjiev. 463-464 [doi]
- MAD7: a memory architecture simulator targeted at design space explorationHadrien A. Clarke, Antoine Trouvé, Kazuaki Murakami. 465-466 [doi]
- A decomposition method with minimal communication volume for parallelization of multi-dimensional FFTsTruong Vinh Truong Duy, Taisuke Ozaki. 467-468 [doi]
- A massively parallel domain decomposition method for large-scale DFT electronic structure calculationsTruong Vinh Truong Duy, Taisuke Ozaki. 469-470 [doi]
- Multi-layered unstructured mesh generationPanagiotis A. Foteinos, Daming Feng, Andrey N. Chernikov, Nikos Chrisochoides. 471-472 [doi]
- Network-on-chip for a partially reconfigurable FPGA systemJustin A. Hogan, Raymond J. Weber, Brock J. LaMeres, Todd Kaiser. 473-474 [doi]
- Exploiting data parallelism in the yConvex hypergraph algorithm for image representation using GPGPUsSaurabh Jha, Tejaswi Agarwal, B. Rajesh Kanna. 475-476 [doi]
- The ARMv8 simulatorTao Jiang, Lele Zhang, Rui Hou, Yi Zhang, Qianlong Zhang, Lin Chai, Jing Han, Wuxiong Zhang, Cong Wang, Lixin Zhang. 477-478 [doi]
- Imogen: a parallel 3D fluid and MHD code for GPUsErik Keever, James N. Imamura. 479-480 [doi]
- SMIO: I/O similarity aware virtual machine management invirtual desktop environmentsMin Li, Sushil Mantri, Pin Zhou, Ali Raza Butt. 481-482 [doi]
- Inspector/executor load balancing algorithms for block-sparse tensor contractionsDavid Ozog, Sameer Shende, Allen D. Malony, Jeff R. Hammond, James Dinan, Pavan Balaji. 483-484 [doi]
- Improving performance of openSHMEM reference library by portable PE mapping techniqueSwaroop Pophale, Tony Curtis, Barbara M. Chapman. 485-486 [doi]
- Using platform-independent data locality analysis to predict cache performance on abstract hardware platformsSonish Shrestha. 487-488 [doi]
- Towards shared memory consistency models for GPUsTyler Sorensen, Ganesh Gopalakrishnan, Vinod Grover. 489-490 [doi]
- Exploiting reuse information to reduce refresh energy in on-chip eDRAM cachesAlejandro Valero, Julio Sahuquillo, Salvador Petit, José Duato. 491-492 [doi]
- V-OpenCL: a method to use remote GPGPUCong Wang, Tao Jiang, Rui Hou. 493-494 [doi]
- Power efficiency in a partially reconfigurable multiprocessor systemRaymond J. Weber, Justin A. Hogan, Brock J. LaMeres, Todd Kaiser. 495-496 [doi]