Proceedings of the 23rd international conference on Supercomputing, 2009, Yorktown Heights, NY, USA, June 8-12, 2009 - researchr publication

researchr

You are not signed in
Sign in
Sign up

Michael Gschwind, Alexandru Nicolau, Valentina Salapura, José Moreira, editors, Proceedings of the 23rd international conference on Supercomputing, 2009, Yorktown Heights, NY, USA, June 8-12, 2009. ACM, 2009.

Conference: ics

Abstract is missing.

A european perspective on supercomputingMateo Valero. 1 [doi]

The roadrunner project and the importance of energy efficiency on the road to exascale computingDon G. Grice. 2 [doi]

Computing outside the boxIan T. Foster. 3 [doi]

Implementation of a wide-angle lens distortion correction algorithm on the cell broadband engineKonstantis Daloukas, Christos D. Antonopoulos, Nikolaos Bellas. 4-13 [doi]

High-performance regular expression scanning on the Cell/B.E. processorDaniele Paolo Scarpazza, Gregory F. Russell. 14-25 [doi]

Computer generation of fast fourier transforms for the cell broadband engineSrinivas Chellappa, Franz Franchetti, Markus Püschel. 26-35 [doi]

DBDB: optimizing DMATransfer for the cell be architectureTao Liu, Haibo Lin, Tong Chen, Kevin O Brien, Ling Shao. 36-45 [doi]

Zero-content augmented cachesJulien Dusser, Thomas Piquet, André Seznec. 46-55 [doi]

Dynamic cache clustering for chip multiprocessorsMohammad Hammoud, Sangyeun Cho, Rami G. Melhem. 56-67 [doi]

Less reused filter: improving l2 cache performance via filtering less reused linesLingxiang Xiang, Tianzhou Chen, Qingsong Shi, Wei Hu. 68-79 [doi]

Divide-and-conquer: a bubble replacement for low level cachesChuanjun Zhang, Bing Xue. 80-89 [doi]

OhHelp: a scalable domain-decomposing dynamic load balancing for particle-in-cell simulationsHiroshi Nakashima, Yohei Miyake, Hideyuki Usui, Yoshiharu Omura. 90-99 [doi]

Pattern-based sparse matrix representation for memory-efficient SMVM kernelsMehmet Belgin, Godmar Back, Calvin J. Ribbens. 100-109 [doi]

Dynamic topology aware load balancing algorithms for molecular dynamics applicationsAbhinav Bhatele, Laxmikant V. Kalé, Sameer Kumar. 110-116 [doi]

Fast memory snapshot for concurrent programmingwithout synchronizationJaeWoong Chung, Woongki Baek, Christos Kozyrakis. 117-125 [doi]

QuakeTM: parallelizing a complex sequential application using transactional memoryVladimir Gajinov, Ferad Zyulkyarov, Osman S. Unsal, Adrián Cristal, Eduard Ayguadé, Tim Harris, Mateo Valero. 126-135 [doi]

Refereeing conflicts in hardware transactional memoryArrvindh Shriraman, Sandhya Dwarkadas. 136-146 [doi]

Parametric multi-level tiling of imperfectly nested loopsAlbert Hartono, Muthu Manikandan Baskaran, Cédric Bastoul, Albert Cohen, Sriram Krishnamoorthy, Boyana Norris, J. Ramanujam, P. Sadayappan. 147-157 [doi]

Dynamic parallelization of single-threaded binary programs using speculative slicingCheng Wang, Youfeng Wu, Edson Borin, Shiliang Hu, Wei Liu, Dave Sager, Tin-Fook Ngai, Jesse Fang. 158-168 [doi]

Synchronization optimizations for efficient execution on multi-coresAlexandru Nicolau, Guangqiang Li, Alexander V. Veidenbaum, Arun Kejariwal. 169-180 [doi]

Chunking parallel loops in the presence of synchronizationJun Shirako, Jisheng M. Zhao, V. Krishna Nandivada, Vivek Sarkar. 181-192 [doi]

Efficient high performance collective communication for the cell bladeQasim Ali, Samuel P. Midkiff, Vijay S. Pai. 193-203 [doi]

Practice of parallelizing network applications on multi-core architecturesJunchang Wang, Haipeng Cheng, Bei Hua, Xinan Tang. 204-213 [doi]

Towards 100 gbit/s ethernet: multicore-based parallel communication protocol designStavros Passas, Kostas Magoutis, Angelos Bilas. 214-224 [doi]

Virtualization polling engine (VPE): using dedicated CPU cores to accelerate I/O virtualizationJiuxing Liu, Bülent Abali. 225-234 [doi]

Fast and scalable list ranking on the GPUM. Suhail Rehman, Kishore Kothapalli, P. J. Narayanan. 235-243 [doi]

Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systemsSundaresan Venkatasubramanian, Richard W. Vuduc. 244-255 [doi]

Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUsJiayuan Meng, Kevin Skadron. 256-265 [doi]

Creating artificial global history to improve branch prediction accuracyLeo Porter, Dean M. Tullsen. 266-275 [doi]

Exploring pattern-aware routing in generalized fat tree networksGermán Rodríguez, Ramón Beivide, Cyriel Minkenberg, Jesús Labarta, Mateo Valero. 276-285 [doi]

Understanding the interconnection network of SpiNNakerJavier Navaridas, Mikel Luján, José Miguel-Alonso, Luis A. Plana, Steve Furber. 286-295 [doi]

A graph based approach for MPI deadlock detectionTobias Hilbrich, Bronis R. de Supinski, Martin Schulz, Matthias S. Müller. 296-305 [doi]

Maximizing MPI point-to-point communication performance on RDMA-enabled clusters with customized protocolsMatthew Small, Xin Yuan. 306-315 [doi]

MPI-aware compiler optimizations for improving communication-computation overlapAnthony Danalis, Lori L. Pollock, D. Martin Swany, John Cavazos. 316-325 [doi]

Evaluating high performance communication: a power perspectiveJiuxing Liu, Dan E. Poff, Bülent Abali. 326-337 [doi]

FTL design exploration in reconfigurable high-performance SSD for server applicationsJi-Yong Shin, Zenglin Xia, Ning-Yi Xu, Rui Gao, Xiongfei Cai, Seungryoul Maeng, Feng-hsiung Hsu. 338-349 [doi]

/scratch as a cache: rethinking HPC center scratch storageHenry M. Monti, Ali Raza Butt, Sudharshan S. Vazhkudai. 350-359 [doi]

P-Code: a new RAID-6 code with optimal propertiesChao Jin, Hong Jiang, Dan Feng, Lei Tian. 360-369 [doi]

R-ADMAD: high reliability provision for large-scale de-duplication archival storage systemsChuanyi Liu, Yu Gu, Linchun Sun, Bin Yan, Dongsheng Wang. 370-379 [doi]

Single-particle 3d reconstruction from cryo-electron microscopy images on GPUGuangming Tan, Ziyu Guo, Mingyu Chen, Dan Meng. 380-389 [doi]

How GPUs can outperform ASICs for fast LDPC decodingGabriel Falcão Paiva Fernandes, Vítor Manuel Mendes da Silva, Leonel Sousa. 390-399 [doi]

A translation system for enabling data mining applications on GPUsWenjing Ma, Gagan Agrawal. 400-409 [doi]

Combining thread level speculation helper threads and runahead executionPolychronis Xekalakis, Nikolas Ioannou, Marcelo Cintra. 410-420 [doi]

Limited early value communication to improve performance of transactional memorySalil Mohan Pant, Gregory T. Byrd. 421-429 [doi]

EpiFast: a fast algorithm for large scale realistic epidemic simulations on distributed memory systemsKeith R. Bisset, Jiangzhuo Chen, Xizhou Feng, V. S. Anil Kumar, Madhav V. Marathe. 430-439 [doi]

Using many-core hardware to correlate radio astronomy signalsRob V. van Nieuwpoort, John W. Romein. 440-449 [doi]

A parallel levenberg-marquardt algorithmJun Cao, Krista A. Novstrup, Ayush Goyal, Samuel P. Midkiff, James M. Caruthers. 450-459 [doi]

Adagio: making DVS practical for complex HPC applicationsBarry Rountree, David K. Lowenthal, Bronis R. de Supinski, Martin Schulz, Vincent W. Freeh, Tyler K. Bletsch. 460-469 [doi]

A comprehensive power-performance model for NoCs with multi-flit channel buffersMohammad Arjomand, Hamid Sarbazi-Azad. 470-478 [doi]

Rate-based QoS techniques for cache/memory in CMP platformsAndrew Herdrich, Ramesh Illikkal, Ravi R. Iyer, Donald Newell, Vineet Chadha, Jaideep Moses. 479-488 [doi]

MPI collective communications on the blue gene/p supercomputer: algorithms and optimizationsAhmad Faraj, Sameer Kumar, Brian Smih, Amith R. Mamidala, John A. Gunnels, Philip Heidelberger. 489-490 [doi]

TransMetric: architecture independent workload characterization for transactional memory benchmarksJames Poe, Clay Hughes, Tao Li. 491-492 [doi]

Cancellation of loads that return zero using zero-value cachesMafijul Md Islam, Sally A. McKee, Per Stenström. 493-494 [doi]

Auto-vectorization through code generation for stream processing applicationsHuayong Wang, Henrique Andrade, Bugra Gedik, Kun-Lung Wu. 495-496 [doi]

Subdomain communication to increase scalability in large-scale scientific applicationsAleksandr Ovcharenko, Onkar Sahni, Christopher D. Carothers, Kenneth E. Jansen, Mark S. Shephard. 497-498 [doi]

Access map pattern matching for data cache prefetchYasuo Ishii, Mary Inaba, Kei Hiraki. 499-500 [doi]

Prediction-based power estimation and scheduling for CMPsKaran Singh, Major Bhadauria, Sally A. McKee. 501-502 [doi]

Design of a novel SIMD architecture by fusing operations and registersJih-Ching Chiu, Kai-Ming Yang, Yu-Liang Chou. 503-504 [doi]

Thrifty interconnection network for HPC systemsJian Li, Lixin Zhang, Charles Lefurgy, Richard Treumann, Wolfgang E. Denzel. 505-506 [doi]

Performance modeling for DFT algorithms in FFTWLiang Gu, Xiaoming Li. 507-508 [doi]

PARSEC: hardware profiling of emerging workloads for CMP designMajor Bhadauria, Vincent M. Weaver, Sally A. McKee. 509-510 [doi]

Approximate kernel matrix computation on GPUs forlarge scale learning applicationsMohamed E. Hussein, Wael Abd-Almageed. 511-512 [doi]

Dynamic task set partitioning based on balancing memory requirements to reduce power consumptionDiana Bautista, Julio Sahuquillo, Houcine Hassan, Salvador Petit, José Duato. 513-514 [doi]

High-performance CUDA kernel execution on FPGAsAlexandros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong, Wen-mei W. Hwu. 515-516 [doi]

Load balancing using work-stealing for pipeline parallelism in emerging applicationsAngeles G. Navarro, Rafael Asenjo, Siham Tabik, Calin Cascaval. 517-518 [doi]

Prefetch optimizations on large-scale applications via parameter value predictionShih-Wei Liao, Tzu-Han Hung, Donald Nguyen, Hucheng Zhou, Chinyen Chou, Chiaheng Tu. 519-520 [doi]

Designing multi-socket systems using silicon photonicsScott Beamer, Krste Asanovic, Christopher Batten, Ajay Joshi, Vladimir Stojanovic. 521-522 [doi]

An infrastructure for scalable and portable parallel programs for computational chemistryVictor Lotrich, Norbert Flocke, Mark Ponton, Beverly A. Sanders, Erik Deumens, Rodney J. Bartlett, Ajith Perera. 523-524 [doi]

runs on WebDSL