Abstract is missing.
- Many-core GPU computing with NVIDIA CUDAMark Harris. 1 [doi]
- Challenges on the road to exascale computingTilak Agerwala. 2 [doi]
- Petaflop/s, seriouslyDavid E. Keyes. 3 [doi]
- Implementing Wilson-Dirac operator on the cell broadband engineKhaled Z. Ibrahim, François Bodin. 4-14 [doi]
- Biomedical image analysis on a cooperative cluster of GPUs and multicoresTimothy D. R. Hartley, Ümit V. Çatalyürek, Antonio Ruiz, Francisco D. Igual, Rafael Mayo, Manuel Ujaldon. 15-25 [doi]
- Data mining on the cell broadband engineGregory Buehrer, Srinivasan Parthasarathy, Matthew Goyder. 26-35 [doi]
- Accurate memory signatures and synthetic address traces for HPC applicationsJonathan Weinberg, Allan Snavely. 36-45 [doi]
- Preserving time in large-scale communication tracesPrasun Ratn, Frank Mueller, Bronis R. de Supinski, Martin Schulz. 46-55 [doi]
- A freespace crossbar for multi-core processorsMichel N. Victor, Aris K. Silzars, Edward S. Davidson. 56-62 [doi]
- An approach for adaptive DRAM temperature and power managementSong Liu, Seda Ogrenci Memik, Yu Zhang, Gokhan Memik. 63-72 [doi]
- The shared-thread multiprocessorJeffery A. Brown, Dean M. Tullsen. 73-82 [doi]
- Advanced collective communication in aspenQasim Ali, Vijay S. Pai, Samuel P. Midkiff. 83-93 [doi]
- The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputerSameer Kumar, Gábor Dózsa, Gheorghe Almasi, Philip Heidelberger, Dong Chen, Mark Giampapa, Michael Blocksome, Ahmad Faraj, Jeff Parker, Joe Ratterman, Brian E. Smith, Charles Archer. 94-103 [doi]
- A projection-based optimization framework for abstractions with application to the unstructured mesh domainBrian S. White, Sally A. McKee, Daniel J. Quinlan. 104-113 [doi]
- CprFS: a user-level file system to support consistent file states for checkpoint and restartRuini Xue, Wenguang Chen, Weimin Zheng. 114-123 [doi]
- Timely offloading of result-data in HPC centersHenry M. Monti, Ali Raza Butt, Sudharshan S. Vazhkudai. 124-133 [doi]
- Shifted declustering: a placement-ideal layout scheme for multi-way replication storage architectureHuijun Zhu, Peng Gu, Jun Wang. 134-144 [doi]
- Can software reliability outperform hardware reliability on high performance interconnects?: a case study with MPI over infinibandMatthew J. Koop, Rahul Kumar, Dhabaleswar K. Panda. 145-154 [doi]
- Soft error vulnerability of iterative linear algebra methodsGreg Bronevetsky, Bronis R. de Supinski. 155-164 [doi]
- Evaluating the effect of replacing CNK with linux on the compute-nodes of blue gene/lEdi Shmueli, George Almási, José R. Brunheroto, José G. Castaños, Gábor Dózsa, Sameer Kumar, Derek Lieber. 165-174 [doi]
- Power-aware dynamic placement of HPC applicationsAkshat Verma, Puneet Ahuja, Anindya Neogi. 175-184 [doi]
- Autonomous learning for efficient resource utilization of dynamic VM migrationHyung Won Choi, Hukeun Kwak, Andrew Sohn, Kyusik Chung. 185-194 [doi]
- Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systemsSeyong Lee, Rudolf Eigenmann. 195-204 [doi]
- Fast scan algorithms on graphics processorsYuri Dotsenko, Naga K. Govindaraju, Peter-Pike J. Sloan, Charles Boyd, John Manferdelli. 205-213 [doi]
- Three-dimensional delaunay refinement for multi-core processorsAndrey N. Chernikov, Nikos Chrisochoides. 214-224 [doi]
- A compiler framework for optimization of affine loop nests for gpgpusMuthu Manikandan Baskaran, Uday Bondhugula, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, P. Sadayappan. 225-234 [doi]
- Rotating register allocation with multiple rotating branchesSuhyun Kim, Soo-Mook Moon. 235-244 [doi]
- Automatic SIMD vectorization of chains of recurrencesYixin Shou, Robert A. van Engelen. 245-255 [doi]
- Optimizing irregular shared-memory applications for clustersSeung-Jai Min, Rudolf Eigenmann. 256-265 [doi]
- Performance portable optimizations for loops containing communication operationsCostin Iancu, Wei Chen, Katherine A. Yelick. 266-276 [doi]
- Phasers: a unified deadlock-free construct for collective and point-to-point synchronizationJun Shirako, David M. Peixotto, Vivek Sarkar, William N. Scherer III. 277-288 [doi]
- Orchestrating data transfer for the cell/B.E. processorTong Chen, Haibo Lin, Tao Zhang. 289-298 [doi]
- CUBA: an architecture for efficient CPU/co-processor data communicationIsaac Gelado, John H. Kelm, Shane Ryoo, Steven S. Lumetta, Nacho Navarro, Wen-mei W. Hwu. 299-308 [doi]
- Efficient computation of sum-products on GPUs through software-managed cacheMark Silberstein, Assaf Schuster, Dan Geiger, Anjul Patney, John D. Owens. 309-318 [doi]
- Exploiting idle register classes for fast spill destinationFang Lu, Lei Wang, Xiaobing Feng 0002, Zhiyuan Li, Zhaoqing Zhang. 319-326 [doi]
- Analysis of dynamic power management on multi-core processorsW. Lloyd Bircher, Lizy K. John. 327-338 [doi]
- Focused prefetching: performance oriented prefetching based on commit stallsR. Manikantan, R. Govindarajan. 339-348 [doi]
- Automatic analysis of speedup of MPI applicationsMarc Casas, Rosa M. Badia, Jesús Labarta. 349-358 [doi]
- Analyzing memory access intensity in parallel programs on multicoreLixia Liu, Zhiyuan Li, Ahmed H. Sameh. 359-367 [doi]
- A regression-based approach to scalability predictionBradley J. Barnes, Barry Rountree, David K. Lowenthal, Jaxk Reeves, Bronis R. de Supinski, Martin Schulz. 368-377 [doi]