Abstract is missing.
- A european perspective on supercomputingMateo Valero. 1 [doi]
- The roadrunner project and the importance of energy efficiency on the road to exascale computingDon G. Grice. 2 [doi]
- Computing outside the boxIan T. Foster. 3 [doi]
- Implementation of a wide-angle lens distortion correction algorithm on the cell broadband engineKonstantis Daloukas, Christos D. Antonopoulos, Nikolaos Bellas. 4-13 [doi]
- High-performance regular expression scanning on the Cell/B.E. processorDaniele Paolo Scarpazza, Gregory F. Russell. 14-25 [doi]
- Computer generation of fast fourier transforms for the cell broadband engineSrinivas Chellappa, Franz Franchetti, Markus Püschel. 26-35 [doi]
- DBDB: optimizing DMATransfer for the cell be architectureTao Liu, Haibo Lin, Tong Chen, Kevin O Brien, Ling Shao. 36-45 [doi]
- Zero-content augmented cachesJulien Dusser, Thomas Piquet, André Seznec. 46-55 [doi]
- Dynamic cache clustering for chip multiprocessorsMohammad Hammoud, Sangyeun Cho, Rami G. Melhem. 56-67 [doi]
- Less reused filter: improving l2 cache performance via filtering less reused linesLingxiang Xiang, Tianzhou Chen, Qingsong Shi, Wei Hu. 68-79 [doi]
- Divide-and-conquer: a bubble replacement for low level cachesChuanjun Zhang, Bing Xue. 80-89 [doi]
- OhHelp: a scalable domain-decomposing dynamic load balancing for particle-in-cell simulationsHiroshi Nakashima, Yohei Miyake, Hideyuki Usui, Yoshiharu Omura. 90-99 [doi]
- Pattern-based sparse matrix representation for memory-efficient SMVM kernelsMehmet Belgin, Godmar Back, Calvin J. Ribbens. 100-109 [doi]
- Dynamic topology aware load balancing algorithms for molecular dynamics applicationsAbhinav Bhatele, Laxmikant V. Kalé, Sameer Kumar. 110-116 [doi]
- Fast memory snapshot for concurrent programmingwithout synchronizationJaeWoong Chung, Woongki Baek, Christos Kozyrakis. 117-125 [doi]
- QuakeTM: parallelizing a complex sequential application using transactional memoryVladimir Gajinov, Ferad Zyulkyarov, Osman S. Unsal, Adrián Cristal, Eduard Ayguadé, Tim Harris, Mateo Valero. 126-135 [doi]
- Refereeing conflicts in hardware transactional memoryArrvindh Shriraman, Sandhya Dwarkadas. 136-146 [doi]
- Parametric multi-level tiling of imperfectly nested loopsAlbert Hartono, Muthu Manikandan Baskaran, Cédric Bastoul, Albert Cohen, Sriram Krishnamoorthy, Boyana Norris, J. Ramanujam, P. Sadayappan. 147-157 [doi]
- Dynamic parallelization of single-threaded binary programs using speculative slicingCheng Wang, Youfeng Wu, Edson Borin, Shiliang Hu, Wei Liu, Dave Sager, Tin-Fook Ngai, Jesse Fang. 158-168 [doi]
- Synchronization optimizations for efficient execution on multi-coresAlexandru Nicolau, Guangqiang Li, Alexander V. Veidenbaum, Arun Kejariwal. 169-180 [doi]
- Chunking parallel loops in the presence of synchronizationJun Shirako, Jisheng M. Zhao, V. Krishna Nandivada, Vivek Sarkar. 181-192 [doi]
- Efficient high performance collective communication for the cell bladeQasim Ali, Samuel P. Midkiff, Vijay S. Pai. 193-203 [doi]
- Practice of parallelizing network applications on multi-core architecturesJunchang Wang, Haipeng Cheng, Bei Hua, Xinan Tang. 204-213 [doi]
- Towards 100 gbit/s ethernet: multicore-based parallel communication protocol designStavros Passas, Kostas Magoutis, Angelos Bilas. 214-224 [doi]
- Virtualization polling engine (VPE): using dedicated CPU cores to accelerate I/O virtualizationJiuxing Liu, Bülent Abali. 225-234 [doi]
- Fast and scalable list ranking on the GPUM. Suhail Rehman, Kishore Kothapalli, P. J. Narayanan. 235-243 [doi]
- Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systemsSundaresan Venkatasubramanian, Richard W. Vuduc. 244-255 [doi]
- Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUsJiayuan Meng, Kevin Skadron. 256-265 [doi]
- Creating artificial global history to improve branch prediction accuracyLeo Porter, Dean M. Tullsen. 266-275 [doi]
- Exploring pattern-aware routing in generalized fat tree networksGermán Rodríguez, Ramón Beivide, Cyriel Minkenberg, Jesús Labarta, Mateo Valero. 276-285 [doi]
- Understanding the interconnection network of SpiNNakerJavier Navaridas, Mikel Luján, José Miguel-Alonso, Luis A. Plana, Steve Furber. 286-295 [doi]
- A graph based approach for MPI deadlock detectionTobias Hilbrich, Bronis R. de Supinski, Martin Schulz, Matthias S. Müller. 296-305 [doi]
- Maximizing MPI point-to-point communication performance on RDMA-enabled clusters with customized protocolsMatthew Small, Xin Yuan. 306-315 [doi]
- MPI-aware compiler optimizations for improving communication-computation overlapAnthony Danalis, Lori L. Pollock, D. Martin Swany, John Cavazos. 316-325 [doi]
- Evaluating high performance communication: a power perspectiveJiuxing Liu, Dan E. Poff, Bülent Abali. 326-337 [doi]
- FTL design exploration in reconfigurable high-performance SSD for server applicationsJi-Yong Shin, Zenglin Xia, Ning-Yi Xu, Rui Gao, Xiongfei Cai, Seungryoul Maeng, Feng-hsiung Hsu. 338-349 [doi]
- /scratch as a cache: rethinking HPC center scratch storageHenry M. Monti, Ali Raza Butt, Sudharshan S. Vazhkudai. 350-359 [doi]
- P-Code: a new RAID-6 code with optimal propertiesChao Jin, Hong Jiang, Dan Feng, Lei Tian. 360-369 [doi]
- R-ADMAD: high reliability provision for large-scale de-duplication archival storage systemsChuanyi Liu, Yu Gu, Linchun Sun, Bin Yan, Dongsheng Wang. 370-379 [doi]
- Single-particle 3d reconstruction from cryo-electron microscopy images on GPUGuangming Tan, Ziyu Guo, Mingyu Chen, Dan Meng. 380-389 [doi]
- How GPUs can outperform ASICs for fast LDPC decodingGabriel Falcão Paiva Fernandes, Vítor Manuel Mendes da Silva, Leonel Sousa. 390-399 [doi]
- A translation system for enabling data mining applications on GPUsWenjing Ma, Gagan Agrawal. 400-409 [doi]
- Combining thread level speculation helper threads and runahead executionPolychronis Xekalakis, Nikolas Ioannou, Marcelo Cintra. 410-420 [doi]
- Limited early value communication to improve performance of transactional memorySalil Mohan Pant, Gregory T. Byrd. 421-429 [doi]
- EpiFast: a fast algorithm for large scale realistic epidemic simulations on distributed memory systemsKeith R. Bisset, Jiangzhuo Chen, Xizhou Feng, V. S. Anil Kumar, Madhav V. Marathe. 430-439 [doi]
- Using many-core hardware to correlate radio astronomy signalsRob V. van Nieuwpoort, John W. Romein. 440-449 [doi]
- A parallel levenberg-marquardt algorithmJun Cao, Krista A. Novstrup, Ayush Goyal, Samuel P. Midkiff, James M. Caruthers. 450-459 [doi]
- Adagio: making DVS practical for complex HPC applicationsBarry Rountree, David K. Lowenthal, Bronis R. de Supinski, Martin Schulz, Vincent W. Freeh, Tyler K. Bletsch. 460-469 [doi]
- A comprehensive power-performance model for NoCs with multi-flit channel buffersMohammad Arjomand, Hamid Sarbazi-Azad. 470-478 [doi]
- Rate-based QoS techniques for cache/memory in CMP platformsAndrew Herdrich, Ramesh Illikkal, Ravi R. Iyer, Donald Newell, Vineet Chadha, Jaideep Moses. 479-488 [doi]
- MPI collective communications on the blue gene/p supercomputer: algorithms and optimizationsAhmad Faraj, Sameer Kumar, Brian Smih, Amith R. Mamidala, John A. Gunnels, Philip Heidelberger. 489-490 [doi]
- TransMetric: architecture independent workload characterization for transactional memory benchmarksJames Poe, Clay Hughes, Tao Li. 491-492 [doi]
- Cancellation of loads that return zero using zero-value cachesMafijul Md Islam, Sally A. McKee, Per Stenström. 493-494 [doi]
- Auto-vectorization through code generation for stream processing applicationsHuayong Wang, Henrique Andrade, Bugra Gedik, Kun-Lung Wu. 495-496 [doi]
- Subdomain communication to increase scalability in large-scale scientific applicationsAleksandr Ovcharenko, Onkar Sahni, Christopher D. Carothers, Kenneth E. Jansen, Mark S. Shephard. 497-498 [doi]
- Access map pattern matching for data cache prefetchYasuo Ishii, Mary Inaba, Kei Hiraki. 499-500 [doi]
- Prediction-based power estimation and scheduling for CMPsKaran Singh, Major Bhadauria, Sally A. McKee. 501-502 [doi]
- Design of a novel SIMD architecture by fusing operations and registersJih-Ching Chiu, Kai-Ming Yang, Yu-Liang Chou. 503-504 [doi]
- Thrifty interconnection network for HPC systemsJian Li, Lixin Zhang, Charles Lefurgy, Richard Treumann, Wolfgang E. Denzel. 505-506 [doi]
- Performance modeling for DFT algorithms in FFTWLiang Gu, Xiaoming Li. 507-508 [doi]
- PARSEC: hardware profiling of emerging workloads for CMP designMajor Bhadauria, Vincent M. Weaver, Sally A. McKee. 509-510 [doi]
- Approximate kernel matrix computation on GPUs forlarge scale learning applicationsMohamed E. Hussein, Wael Abd-Almageed. 511-512 [doi]
- Dynamic task set partitioning based on balancing memory requirements to reduce power consumptionDiana Bautista, Julio Sahuquillo, Houcine Hassan, Salvador Petit, José Duato. 513-514 [doi]
- High-performance CUDA kernel execution on FPGAsAlexandros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong, Wen-mei W. Hwu. 515-516 [doi]
- Load balancing using work-stealing for pipeline parallelism in emerging applicationsAngeles G. Navarro, Rafael Asenjo, Siham Tabik, Calin Cascaval. 517-518 [doi]
- Prefetch optimizations on large-scale applications via parameter value predictionShih-Wei Liao, Tzu-Han Hung, Donald Nguyen, Hucheng Zhou, Chinyen Chou, Chiaheng Tu. 519-520 [doi]
- Designing multi-socket systems using silicon photonicsScott Beamer, Krste Asanovic, Christopher Batten, Ajay Joshi, Vladimir Stojanovic. 521-522 [doi]
- An infrastructure for scalable and portable parallel programs for computational chemistryVictor Lotrich, Norbert Flocke, Mark Ponton, Beverly A. Sanders, Erik Deumens, Rodney J. Bartlett, Ajith Perera. 523-524 [doi]