Abstract is missing.
- An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production CodeTakashi Shimokawabe, Takayuki Aoki, Chiashi Muroi, Junichi Ishida, Kohei Kawano, Toshio Endo, Akira Nukada, Naoya Maruyama, Satoshi Matsuoka. 1-11 [doi]
- Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous ArchitecturesAbtin Rahimian, Ilya Lashuk, Shravan K. Veerapaneni, Aparna Chandramowlishwaran, Dhairya Malhotra, Logan Moon, Rahul S. Sampath, Aashay Shringarpure, Jeffrey Vetter, Richard W. Vuduc, Denis Zorin, George Biros. 1-11 [doi]
- Exploiting 162-Nanosecond End-to-End Communication Latency on AntonRon O. Dror, J. P. Grossman, Kenneth M. Mackenzie, Brian Towles, Edmond Chow, John K. Salmon, Cliff Young, Joseph A. Bank, Brannon Batson, Martin M. Deneroff, Jeffrey Kuskin, Richard H. Larson, Mark A. Moraes, David E. Shaw. 1-12 [doi]
- A Parallel Implementation of Electron-Phonon Scattering in Nanoelectronic Devices up to 95k CoresMathieu Luisier. 1-11 [doi]
- A Multi-Scale Heart Simulation on Massively Parallel ComputersAkira Hosoi, Takumi Washio, Jun-ichi Okada, Yoshimasa Kadooka, Kengo Nakajima, Toshiaki Hisada. 1-11 [doi]
- Scaling Hierarchical N-body Simulations on GPU ClustersPritish Jetley, Lukasz Wesolowski, Filippo Gioachin, Laxmikant V. Kalé, Thomas R. Quinn. 1-11 [doi]
- Circuit-Switched Memory Access in Photonic Interconnection Networks for High-Performance Embedded ComputingGilbert Hendry, Eric Robinson, Vitaliy Gleyzer, Johnnie Chan, Luca P. Carloni, Nadya Travinin Bliss, Keren Bergman. 1-12 [doi]
- Toward First Principles Electronic Structure Simulations of Excited States and Strong Correlations in Nano- and Materials ScienceAnton Kozhevnikov, Adolfo G. Eguiluz, Thomas C. Schulthess. 1-10 [doi]
- Power-Aware Consolidation of Scientific Workflows in Virtualized EnvironmentsQian Zhu, Jiedan Zhu, Gagan Agrawal. 1-12 [doi]
- Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene s CNKMark Giampapa, Thomas Gooding, Todd Inglett, Robert W. Wisniewski. 1-10 [doi]
- Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External MemoryRoger A. Pearce, Maya Gokhale, Nancy M. Amato. 1-11 [doi]
- IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server CoordinationXuechen Zhang, Kei Davis, Song Jiang. 1-11 [doi]
- 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUsAnthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Changkyu Kim, Pradeep Dubey. 1-13 [doi]
- Exploring a Novel Gathering Method for Finite Element Codes on the Cell/B.E. ArchitectureMohammad Jowkar, Raúl de la Cruz, José María Cela. 1-11 [doi]
- Optimal Utilization of Heterogeneous Resources for Biomolecular SimulationsScott S. Hampton, Sadaf R. Alam, Paul S. Crozier, Pratul K. Agarwal. 1-11 [doi]
- JAWS: Job-Aware Workload Scheduling for the Exploration of Turbulence SimulationsXiaodan Wang, Eric A. Perlman, Randal C. Burns, Tanu Malik, Tamas Budavari, Charles Meneveau, Alexander S. Szalay. 1-11 [doi]
- Parallel Fast Gauss TransformRahul S. Sampath, Hari Sundar, Shravan K. Veerapaneni. 1-10 [doi]
- Combined Iterative and Model-driven Optimization in an Automatic Parallelization FrameworkLouis-Noël Pouchet, Uday Bondhugula, Cédric Bastoul, Albert Cohen, J. Ramanujam, P. Sadayappan. 1-11 [doi]
- The 48-core SCC Processor: the Programmer s ViewTimothy G. Mattson, Michael Riepen, Thomas Lehnig, Paul Brett, Werner Haas, Patrick Kennedy, Jason Howard, Sriram R. Vangal, Nitin Borkar, Greg Ruhl, Saurabh Dighe. 1-11 [doi]
- Characterizing the Influence of System Noise on Large-Scale Applications by SimulationTorsten Hoefler, Timo Schneider, Andrew Lumsdaine. 1-11 [doi]
- Managing Variability in the IO Performance of Petascale Storage SystemsJay F. Lofstead, Fang Zheng, Qing Liu, Scott Klasky, Ron Oldfield, Todd Kordenbrock, Karsten Schwan, Matthew Wolf. 1-12 [doi]
- 190 TFlops Astrophysical N-body Simulation on a Cluster of GPUsTsuyoshi Hamada, Keigo Nitadori. 1-9 [doi]
- Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory AccessesAndreas Sandberg, David Eklov, Erik Hagersten. 1-11 [doi]
- Direct Numerical Simulation of Particulate Flows on 294912 Processor CoresJan Götz, Klaus Iglberger, Markus Stürmer, Ulrich Rüde. 1-11 [doi]
- Automatic Run-time Parallelization and Transformation of I/OThorvald Natvig, Anne C. Elster, Jan Christian Meyer. 1-10 [doi]
- Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing SystemAdam Moody, Greg Bronevetsky, Kathryn Mohror, Bronis R. de Supinski. 1-11 [doi]
- Extreme-Scale AMRCarsten Burstedde, Omar Ghattas, Michael Gurnis, Tobin Isaac, Georg Stadler, Tim Warburton, Lucas C. Wilcox. 1-12 [doi]
- A Block-Oriented Language and Runtime System for Tensor Algebra with Very Large ArraysBeverly A. Sanders, Rodney J. Bartlett, Erik Deumens, Victor Lotrich, Mark Ponton. 1-11 [doi]
- Size Matters: Space/Time Tradeoffs to Improve GPGPU Applications PerformanceAbdullah Gharaibeh, Matei Ripeanu. 1-12 [doi]
- CPM in CMPs: Coordinated Power Management in Chip-MultiprocessorsAsit K. Mishra, Shekhar Srikantaiah, Mahmut T. Kandemir, Chita R. Das. 1-12 [doi]
- Scalable Earthquake Simulation on Petascale SupercomputersYifeng Cui, Kim B. Olsen, Thomas Jordan, Kwangyoon Lee, Jun Zhou, Patrick Small, Daniel Roten, Geoffrey Ely, Dhabaleswar K. Panda, Amit Chourasia, John Levesque, Steven M. Day, Philip Maechling. 1-20 [doi]
- An Adaptive Framework for Simulation and Online Remote Visualization of Critical Climate Applications in Resource-constrained EnvironmentsPreeti Malakar, Vijay Natarajan, Sathish S. Vadhiyar. 1-11 [doi]
- A Scalable and Distributed Dynamic Formal Verifier for MPI ProgramsAnh Vo, Sriram Aananthakrishnan, Ganesh Gopalakrishnan, Bronis R. de Supinski, Martin Schulz, Greg Bronevetsky. 1-10 [doi]
- Hierarchical Diagonal Blocking and Precision Reduction Applied to Combinatorial MultigridGuy E. Blelloch, Ioannis Koutis, Gary L. Miller, Kanat Tangwongsan. 1-12 [doi]
- Understanding the Impact of Emerging Non-Volatile Memories on High-Performance, IO-Intensive ComputingAdrian M. Caulfield, Joel Coburn, Todor Mollov, Arup De, Ameen Akel, Jiahua He, Arun Jagatheesan, Rajesh K. Gupta, Allan Snavely, Steven Swanson. 1-11 [doi]
- OpenMPC: Extended OpenMP Programming and Tuning for GPUsSeyong Lee, Rudolf Eigenmann. 1-11 [doi]
- Multiscale Simulation of Cardiovascular flows on the IBM Bluegene/P: Full Heart-Circulation System at Red-Blood Cell ResolutionAmanda Peters, Simone Melchionna, Efthimios Kaxiras, Jonas Lätt, Joy K. Sircar, Massimo Bernaschi, Mauro Bisson, Sauro Succi. 1-10 [doi]
- Overlapping Methods of All-to-All Communication and FFT Algorithms for Torus-Connected Massively Parallel SupercomputersJun Doi, Yasushi Negishi. 1-9 [doi]
- Data Sharing Options for Scientific Workflows on Amazon EC2Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta, G. Bruce Berriman, Benjamin P. Berman, Philip Maechling. 1-9 [doi]
- Elastic Cloud Caches for Accelerating Service-Oriented ComputationsDavid Chiu, Apeksha Shetty, Gagan Agrawal. 1-11 [doi]
- Strider: Runtime Support for Optimizing Strided Data Accesses on Multi-Cores with Explicitly Managed MemoriesJae-Seung Yeom, Dimitrios S. Nikolopoulos. 1-11 [doi]
- A Flexible Reservation Algorithm for Advance Network ProvisioningMehmet Balman, Evangelos Chaniotakisy, Arie Shoshani, Alex Sim. 1-11 [doi]
- DASH: a Recipe for a Flash-based Data Intensive SupercomputerJiahua He, Arun Jagatheesan, Sandeep Gupta, Jeffrey Bennett, Allan Snavely. 1-11 [doi]
- Accelerating I/O Forwarding in IBM Blue Gene/P SystemsVenkatram Vishwanath, Mark Hereld, Kamil Iskra, Dries Kimpe, Vitali Morozov, Michael E. Papka, Robert B. Ross, Kazutomo Yoshii. 1-10 [doi]
- Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole MethodAparna Chandramowlishwarany, Kamesh Madduri, Richard W. Vuduc. 1-12 [doi]
- The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent CachesDavid Tarjan, Kevin Skadron. 1-10 [doi]
- Scalable Identification of Load Imbalance in Parallel Executions Using Call Path ProfilesNathan R. Tallent, Laksono Adhianto, John M. Mellor-Crummey. 1-11 [doi]
- PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC ApplicationsMartin Burtscher, Byoung-Do Kim, Jeffrey R. Diamond, John D. McCalpin, Lars Koesterke, James C. Browne. 1-11 [doi]
- Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum ChromodynamicsRonald Babich, Michael A. Clark, Balint Joo. 1-11 [doi]
- Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster SystemsFengguang Song, Hatem Ltaief, Bilel Hadri, Jack Dongarra. 1-11 [doi]
- Fast PGAS Implementation of Distributed Graph AlgorithmsGuojing Cong, George Almasi, Vijay A. Saraswat. 1-11 [doi]
- Scalable Graph Exploration on Multicore ProcessorsVirat Agarwal, Fabrizio Petrini, Davide Pasetto, David A. Bader. 1-11 [doi]
- Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller SupportXiangyu Dong, Yuan Xie, Naveen Muralimanohar, Norman P. Jouppi. 1-11 [doi]
- Functional Partitioning to Optimize End-to-End Performance on Many-core ArchitecturesMin Li, Sudharshan S. Vazhkudai, Ali Raza Butt, Fei Meng, Xiaosong Ma, Youngjae Kim, Christian Engelmann, Galen M. Shipman. 1-12 [doi]
- FlowChecker: Detecting Bugs in MPI Libraries via Message Flow CheckingZhezhe Chen, Qi Gao, Wenbin Zhang, Feng Qin. 1-11 [doi]
- vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement OffloadArdalan Kangarlou, Sahan Gamage, Ramana Rao Kompella, Dongyan Xu. 1-11 [doi]
- On-Chip Network Evaluation FrameworkHanjoon Kim, Seulki Heo, Junghoon Lee, Jaehyuk Huh, John Kim. 10 [doi]