Proceedings of the 24th International Conference on Supercomputing, 2010, Tsukuba, Ibaraki, Japan, June 2-4, 2010 - researchr publication

researchr

You are not signed in
Sign in
Sign up

Taisuke Boku, Hiroshi Nakashima, Avi Mendelson, editors, Proceedings of the 24th International Conference on Supercomputing, 2010, Tsukuba, Ibaraki, Japan, June 2-4, 2010. ACM, 2010.

Conference: ics

Abstract is missing.

Throughput computingWilliam J. Dally. 2 [doi]

The next-generation supercomputer project and a plan for the advanced institute for computational scienceKimihiko Hirao. 3 [doi]

Overlapping communication and computation by using a hybrid MPI/SMPSs approachVladimir Marjanovic, Jesús Labarta, Eduard Ayguadé, Mateo Valero. 5-16 [doi]

Quantifying performance benefits of overlap using MPI-2 in a seismic modeling applicationSreeram Potluri, Ping Lai, Karen A. Tomko, Sayantan Sur, Yifeng Cui, Mahidhar Tatineni, Karl W. Schulz, William L. Barth, Amitava Majumdar, Dhabaleswar K. Panda. 17-25 [doi]

Optimal bucket algorithms for large MPI collectives on torus interconnectsNikhil Jain, Yogish Sabharwal. 27-36 [doi]

The auction: optimizing banks usage in Non-Uniform Cache ArchitecturesJavier Lira, Carlos Molina, Antonio González. 37-47 [doi]

Cache oblivious parallelograms in iterative stencil computationsRobert Strzodka, Mohammed Shaheen, Dawid Pajak, Hans-Peter Seidel. 49-59 [doi]

Making nested parallel transactions practical using lightweight hardware supportWoongki Baek, Nathan Grasso Bronson, Christos Kozyrakis, Kunle Olukotun. 61-71 [doi]

Fast and accurate NCBI BLASTP: acceleration with multiphase FPGA-based prefilteringAtabak Mahram, Martin C. Herbordt. 73-82 [doi]

::::ParaLearn::::: a massively parallel, scalable system for learning interaction networks on FPGAsNarges Bani Asadi, Christopher W. Fletcher, Greg Gibeling, John Wawrzynek, Wing H. Wong, Garry P. Nolan. 83-94 [doi]

High-throughput Bayesian network learning using heterogeneous multicore computersMichael D. Linderman, Robert Bruggner, Vivek Athalye, Teresa H. Y. Meng, Narges Bani Asadi, Garry P. Nolan. 95-104 [doi]

Evaluation of parallel H.264 decoding strategies for the Cell Broadband EngineChi Ching Chi, Ben H. H. Juurlink, Cor Meenderinck. 105-114 [doi]

Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remappingEddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Xipeng Shen. 115-126 [doi]

An experimental approach to performance measurement of heterogeneous parallel applications using CUDAAllen D. Malony, Scott Biersdorff, Wyatt Spear, Shangkar Mayanglambam. 127-136 [doi]

Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurationsVignesh T. Ravi, Wenjing Ma, David Chiu, Gagan Agrawal. 137-146 [doi]

Decomposable and responsive power models for multicore processors using performance countersRamon Bertran, Marc González, Xavier Martorell, Nacho Navarro, Eduard Ayguadé. 147-158 [doi]

Enigma: architectural and operating system support for reducing the impact of address translationLixin Zhang, Evan Speight, Ramakrishnan Rajamony, Jiang Lin. 159-168 [doi]

Timing local streams: improving timeliness in data prefetchingHuaiyu Zhu, Yong Chen, Xian-He Sun. 169-178 [doi]

SAMS multi-layout memory: providing multiple views of data to boost SIMD performanceChunyang Gou, Georgi Kuzmanov, Georgi Gaydadjiev. 179-188 [doi]

An approach to resource-aware co-scheduling for CMPsMajor Bhadauria, Sally A. McKee. 189-199 [doi]

A query language for understanding component interactions in production systemsAdam J. Oliner, Alex Aiken. 201-210 [doi]

Adaptive multi-level cache allocation in distributed storage architecturesRamya Prabhakar, Shekhar Srikantaiah, Mahmut T. Kandemir, Christina M. Patrick. 211-221 [doi]

::::InterferenceRemoval::::: removing interference of disk access for MPI programs through data replicationXuechen Zhang, Song Jiang. 223-232 [doi]

Indemics: an interactive data intensive framework for high performance epidemic simulationKeith R. Bisset, Jiangzhuo Chen, Xizhou Feng, Yifei Ma, Madhav V. Marathe. 233-242 [doi]

Clustering performance data efficiently at massive scalesTodd Gamblin, Bronis R. de Supinski, Martin Schulz, Robert J. Fowler, Daniel A. Reed. 243-252 [doi]

Speeding up Nek5000 with autotuning and specializationJaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul F. Fischer, Paul D. Hovland. 253-262 [doi]

Handling task dependencies under strided and aliased referencesJosep M. Pérez, Rosa M. Badia, Jesús Labarta. 263-274 [doi]

How to unleash array optimizations on code using recursive data structuresHarmen L. A. van der Spek, C. W. Mattias Holm, Harry A. G. Wijshoff. 275-284 [doi]

A compiler-automated array compression scheme for optimizing memory intensive programsLixia Liu, Zhiyuan Li. 285-294 [doi]

Static reuse distances for locality-based optimizations in MATLABArun Chauhan, Chun-Yu Shei. 295-304 [doi]

An empirically tuned 2D and 3D FFT library on CUDA GPULiang Gu, Xiaoming Li, Jakob Siegel. 305-314 [doi]

Large-scale FFT on GPU clustersYifeng Chen, Xiang Cui, Hong Mei. 315-324 [doi]

FPGA accelerating double/quad-double high precision floating-point applications for ExaScale computingYong Dou, Yuanwu Lei, Guiming Wu, Song Guo, Jie Zhou, Li Shen. 325-336 [doi]

Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimizationJamin Naghmouchi, Daniele Paolo Scarpazza, Mladen Berekovic. 337-348 [doi]

runs on WebDSL