Abstract is missing.
- Throughput computingWilliam J. Dally. 2 [doi]
- The next-generation supercomputer project and a plan for the advanced institute for computational scienceKimihiko Hirao. 3 [doi]
- Overlapping communication and computation by using a hybrid MPI/SMPSs approachVladimir Marjanovic, Jesús Labarta, Eduard Ayguadé, Mateo Valero. 5-16 [doi]
- Quantifying performance benefits of overlap using MPI-2 in a seismic modeling applicationSreeram Potluri, Ping Lai, Karen A. Tomko, Sayantan Sur, Yifeng Cui, Mahidhar Tatineni, Karl W. Schulz, William L. Barth, Amitava Majumdar, Dhabaleswar K. Panda. 17-25 [doi]
- Optimal bucket algorithms for large MPI collectives on torus interconnectsNikhil Jain, Yogish Sabharwal. 27-36 [doi]
- The auction: optimizing banks usage in Non-Uniform Cache ArchitecturesJavier Lira, Carlos Molina, Antonio González. 37-47 [doi]
- Cache oblivious parallelograms in iterative stencil computationsRobert Strzodka, Mohammed Shaheen, Dawid Pajak, Hans-Peter Seidel. 49-59 [doi]
- Making nested parallel transactions practical using lightweight hardware supportWoongki Baek, Nathan Grasso Bronson, Christos Kozyrakis, Kunle Olukotun. 61-71 [doi]
- Fast and accurate NCBI BLASTP: acceleration with multiphase FPGA-based prefilteringAtabak Mahram, Martin C. Herbordt. 73-82 [doi]
- ::::ParaLearn::::: a massively parallel, scalable system for learning interaction networks on FPGAsNarges Bani Asadi, Christopher W. Fletcher, Greg Gibeling, John Wawrzynek, Wing H. Wong, Garry P. Nolan. 83-94 [doi]
- High-throughput Bayesian network learning using heterogeneous multicore computersMichael D. Linderman, Robert Bruggner, Vivek Athalye, Teresa H. Y. Meng, Narges Bani Asadi, Garry P. Nolan. 95-104 [doi]
- Evaluation of parallel H.264 decoding strategies for the Cell Broadband EngineChi Ching Chi, Ben H. H. Juurlink, Cor Meenderinck. 105-114 [doi]
- Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remappingEddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Xipeng Shen. 115-126 [doi]
- An experimental approach to performance measurement of heterogeneous parallel applications using CUDAAllen D. Malony, Scott Biersdorff, Wyatt Spear, Shangkar Mayanglambam. 127-136 [doi]
- Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurationsVignesh T. Ravi, Wenjing Ma, David Chiu, Gagan Agrawal. 137-146 [doi]
- Decomposable and responsive power models for multicore processors using performance countersRamon Bertran, Marc González, Xavier Martorell, Nacho Navarro, Eduard Ayguadé. 147-158 [doi]
- Enigma: architectural and operating system support for reducing the impact of address translationLixin Zhang, Evan Speight, Ramakrishnan Rajamony, Jiang Lin. 159-168 [doi]
- Timing local streams: improving timeliness in data prefetchingHuaiyu Zhu, Yong Chen, Xian-He Sun. 169-178 [doi]
- SAMS multi-layout memory: providing multiple views of data to boost SIMD performanceChunyang Gou, Georgi Kuzmanov, Georgi Gaydadjiev. 179-188 [doi]
- An approach to resource-aware co-scheduling for CMPsMajor Bhadauria, Sally A. McKee. 189-199 [doi]
- A query language for understanding component interactions in production systemsAdam J. Oliner, Alex Aiken. 201-210 [doi]
- Adaptive multi-level cache allocation in distributed storage architecturesRamya Prabhakar, Shekhar Srikantaiah, Mahmut T. Kandemir, Christina M. Patrick. 211-221 [doi]
- ::::InterferenceRemoval::::: removing interference of disk access for MPI programs through data replicationXuechen Zhang, Song Jiang. 223-232 [doi]
- Indemics: an interactive data intensive framework for high performance epidemic simulationKeith R. Bisset, Jiangzhuo Chen, Xizhou Feng, Yifei Ma, Madhav V. Marathe. 233-242 [doi]
- Clustering performance data efficiently at massive scalesTodd Gamblin, Bronis R. de Supinski, Martin Schulz, Robert J. Fowler, Daniel A. Reed. 243-252 [doi]
- Speeding up Nek5000 with autotuning and specializationJaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul F. Fischer, Paul D. Hovland. 253-262 [doi]
- Handling task dependencies under strided and aliased referencesJosep M. Pérez, Rosa M. Badia, Jesús Labarta. 263-274 [doi]
- How to unleash array optimizations on code using recursive data structuresHarmen L. A. van der Spek, C. W. Mattias Holm, Harry A. G. Wijshoff. 275-284 [doi]
- A compiler-automated array compression scheme for optimizing memory intensive programsLixia Liu, Zhiyuan Li. 285-294 [doi]
- Static reuse distances for locality-based optimizations in MATLABArun Chauhan, Chun-Yu Shei. 295-304 [doi]
- An empirically tuned 2D and 3D FFT library on CUDA GPULiang Gu, Xiaoming Li, Jakob Siegel. 305-314 [doi]
- Large-scale FFT on GPU clustersYifeng Chen, Xiang Cui, Hong Mei. 315-324 [doi]
- FPGA accelerating double/quad-double high precision floating-point applications for ExaScale computingYong Dou, Yuanwu Lei, Guiming Wu, Song Guo, Jie Zhou, Li Shen. 325-336 [doi]
- Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimizationJamin Naghmouchi, Daniele Paolo Scarpazza, Mladen Berekovic. 337-348 [doi]