Abstract is missing.
- Exascale computing: the challenges and opportunities in the next decadeTilak Agerwala. 1-2 [doi]
- Structure-driven optimizations for amorphous data-parallel programsMario Méndez-Lojo, Donald Nguyen, Dimitrios Prountzos, Xin Sui, M. Amber Hassaan, Milind Kulkarni, Martin Burtscher, Keshav Pingali. 3-14 [doi]
- GAMBIT: effective unit testing for concurrency librariesKatherine E. Coons, Sebastian Burckhardt, Madanlal Musuvathi. 15-24 [doi]
- Featherweight X10: a core calculus for async-finish parallelismJonathan K. Lee, Jens Palsberg. 25-36 [doi]
- Compiler aided selective lock assignment for improving the performance of software transactional memorySandya Mannarswamy, Dhruva R. Chakrabarti, Kaushik Rajan, Sujoy Saraswati. 37-46 [doi]
- Is transactional programming actually easier?Christopher J. Rossbach, Owen S. Hofmann, Emmett Witchel. 47-56 [doi]
- Debugging programs that use atomic blocks and transactional memoryFerad Zyulkyarov, Tim Harris, Osman S. Unsal, Adrián Cristal, Mateo Valero. 57-66 [doi]
- NOrec: streamlining STM by abolishing ownership recordsLuke Dalessandro, Michael F. Spear, Michael L. Scott. 67-78 [doi]
- Scheduling support for transactional memory contention managementWalther Maldonado, Patrick Marlier, Pascal Felber, Adi Suissa, Danny Hendler, Alexandra Fedorova, Julia L. Lawall, Gilles Muller. 79-90 [doi]
- Leveraging parallel nesting in transactional memoryJoão Barreto, Aleksandar Dragojevic, Paulo Ferreira, Rachid Guerraoui, Michal Kapalka. 91-100 [doi]
- Extreme scale computing: challenges and opportunitiesJosep Torrellas, Bill Gropp, Jaime Moreno, Kunle Olukotun, Vivek Sarkar. 101-102 [doi]
- Is hardware innovation over?Professor Arvind. 103-104 [doi]
- An adaptive performance modeling tool for GPU architecturesSara S. Baghsorkhi, Matthieu Delahaye, Sanjay J. Patel, William D. Gropp, Wen-mei W. Hwu. 105-114 [doi]
- Model-driven autotuning of sparse matrix-vector multiply on GPUsJee W. Choi, Amik Singh, Richard W. Vuduc. 115-126 [doi]
- Fast tridiagonal solvers on the GPUYao Zhang, Jonathan Cohen, John D. Owens. 127-136 [doi]
- CUDAlign: using GPU to accelerate the comparison of megabase genomic sequencesEdans Flavius de Oliveira Sandes, Alba Cristina Magalhaes Alves de Melo. 137-146 [doi]
- Load balancing on speedSteven Hofmeyr, Costin Iancu, Filip Blagojevic. 147-158 [doi]
- Scalable communication protocols for dynamic sparse data exchangeTorsten Hoefler, Christian Siebert, Andrew Lumsdaine. 159-168 [doi]
- The LOFAR correlator: implementation and performance analysisJohn W. Romein, P. Chris Broekema, Jan-David Mol, Rob van Nieuwpoort. 169-178 [doi]
- Lazy binary-splitting: a run-time adaptive work-stealing schedulerAlexandros Tzannes, George C. Caragea, Rajeev Barua, Uzi Vishkin. 179-190 [doi]
- Thread to strand binding of parallel network applications in massive multi-threaded systemsPetar Radojkovic, Vladimir Cakarevic, Javier Verdú, Alex Pajuelo, Francisco J. Cazorla, Mario Nemirovsky, Mateo Valero. 191-202 [doi]
- Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?Eddy Z. Zhang, Yunlian Jiang, Xipeng Shen. 203-212 [doi]
- Improving parallelism and locality with asynchronous algorithmsLixia Liu, Zhiyuan Li. 213-222 [doi]
- Scaling LAPACK panel operations using parallel cache assignmentAnthony M. Castaldo, R. Clint Whaley. 223-232 [doi]
- Composable thread coloringDean F. Sutherland, William L. Scherlis. 233-244 [doi]
- Helper locks for fork-join parallel programmingKunal Agrawal, Charles E. Leiserson, Jim Sukha. 245-256 [doi]
- A practical concurrent binary search treeNathan Grasso Bronson, Jared Casper, Hassan Chafi, Kunle Olukotun. 257-268 [doi]
- Analyzing lock contention in multithreaded applicationsNathan R. Tallent, John M. Mellor-Crummey, Allan Porterfield. 269-280 [doi]
- Using data structure knowledge for efficient lock generation and strong atomicityGautam Upadhyaya, Samuel P. Midkiff, Vijay S. Pai. 281-292 [doi]
- Modeling advanced collective communication algorithms on cell-based systemsQasim Ali, Samuel P. Midkiff, Vijay S. Pai. 293-304 [doi]
- PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single nodeJidong Zhai, Wenguang Chen, Weimin Zheng. 305-314 [doi]
- Input-driven dynamic execution prediction of streaming applicationsFarhana Aleen, Monirul Sharif, Santosh Pande. 315-324 [doi]
- Towards scalable and transparent parallelization of multiplayer games using transactional memory supportDaniel Lupei, Bogdan Simion, Don Pinto, Matthew Misler, Mihai Burcea, William Krick, Cristiana Amza. 325-326 [doi]
- KRASH: reproducible CPU load generation on many cores machinesSwann Perarnau, Guillaume Huard. 327-328 [doi]
- Intra-application shared cache partitioning for multithreaded applicationsSai Prashanth Muralidhara, Mahmut T. Kandemir, Padma Raghavan. 329-330 [doi]
- Symbolic prefetching in transactional distributed shared memoryAlokika Dash, Brian Demsky. 331-332 [doi]
- New abstractions for effective performance analysis of STM programsDhruva R. Chakrabarti. 333-334 [doi]
- Continuous speculative program parallelization in softwareChao Zhang, Chen Ding, Xiaoming Gu, Kirk Kelsey, Tongxin Bai, Xiaobing Feng 0002. 335-336 [doi]
- Effective communication and computation overlap with hybrid MPI/SMPSsVladimir Marjanovic, Jesús Labarta, Eduard Ayguadé, Mateo Valero. 337-338 [doi]
- Supporting lock-free composition of concurrent data objectsDaniel Cederman, Philippas Tsigas. 339-340 [doi]
- SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systemsYi Guo, Yisheng Zhao, Vincent Cavé, Vivek Sarkar. 341-342 [doi]
- An optimizing compiler for GPGPU programs with input-data sharingYi Yang, Ping Xiang, Jingfei Kong, Huiyang Zhou. 343-344 [doi]
- Applying the concurrent collections programming model to asynchronous parallel dense linear algebraAparna Chandramowlishwaran, Kathleen Knobe, Richard W. Vuduc. 345-346 [doi]
- Application heartbeats for software performance and healthHenry Hoffmann, Jonathan Eastep, Marco D. Santambrogio, Jason E. Miller, Anant Agarwal. 347-348 [doi]
- Modeling transactional memory workload performanceDonald E. Porter, Emmett Witchel. 349-350 [doi]
- The pilot library for novice MPI programmersJohn D. Carter, William B. Gardner, Gary Grewal. 351-352 [doi]
- Data transformations enabling loop vectorization on multithreaded data parallel architecturesByunghyun Jang, Perhaad Mistry, Dana Schaa, Rodrigo Dominguez, David R. Kaeli. 353-354 [doi]
- A distributed placement service for graph-structured and tree-structured dataGregory Buehrer, Srinivasan Parthasarathy, Shirish Tatikonda. 355-356 [doi]
- A symbolic verifier for CUDA programsGuodong Li, Ganesh Gopalakrishnan, Robert M. Kirby, Dan Quinlan. 357-358 [doi]