Abstract is missing.
- Scalable framework for mapping streaming applications onto multi-GPU systemsHuynh Phung Huynh, Andrei Hagiescu, Weng-Fai Wong, Rick Siow Mong Goh. 1-10 [doi]
- A performance analysis framework for identifying potential benefits in GPGPU applicationsJaewoong Sim, Aniruddha Dasgupta, Hyesoon Kim, Richard W. Vuduc. 11-22 [doi]
- Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processorsSara S. Baghsorkhi, Isaac Gelado, Matthieu Delahaye, Wen-mei W. Hwu. 23-34 [doi]
- Communication avoiding successive band reductionGrey Ballard, James Demmel, Nicholas Knight. 35-44 [doi]
- Faster topology-aware collective algorithms through non-minimal communicationPaul Sack, William Gropp. 45-54 [doi]
- Efficient SIMD code generation for irregular kernelsSeonggun Kim, Hwansoo Han. 55-64 [doi]
- Extending a C-like language for portable SIMD programmingRoland Leißa, Sebastian Hack, Ingo Wald. 65-74 [doi]
- A hybrid approach of OpenMP for clustersOkwan Kwon, Fahed Jubair, Rudolf Eigenmann, Samuel P. Midkiff. 75-84 [doi]
- DOJ: dynamically parallelizing object-oriented programsYong Hun Eom, Stephen Yang, James Christopher Jenista, Brian Demsky. 85-96 [doi]
- S: a scripting language for high-performance RESTful web servicesDaniele Bonetta, Achille Peternier, Cesare Pautasso, Walter Binder. 97-106 [doi]
- A GPU implementation of inclusion-based points-to analysisMario Méndez-Lojo, Martin Burtscher, Keshav Pingali. 107-116 [doi]
- Scalable GPU graph traversalDuane Merrill, Michael Garland, Andrew S. Grimshaw. 117-128 [doi]
- GPU-based NFA implementation for memory efficient high speed regular expression matchingYuan Zu, Ming Yang, Zhonghu Xu, Lin Wang, Xin Tian 0007, Kunyang Peng, Qunfeng Dong. 129-140 [doi]
- A methodology for creating fast wait-free data structuresAlex Kogan, Erez Petrank. 141-150 [doi]
- Concurrent tries with efficient non-blocking snapshotsAleksandar Prokopec, Nathan Grasso Bronson, Phil Bagwell, Martin Odersky. 151-160 [doi]
- A speculation-friendly binary search treeTyler Crain, Vincent Gramoli, Michel Raynal. 161-170 [doi]
- PARRAY: a unifying array representation for heterogeneous parallelismYifeng Chen, Xiang Cui, Hong Mei. 171-180 [doi]
- Internally deterministic parallel algorithms can be fastGuy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Julian Shun. 181-192 [doi]
- Deterministic parallel random-number generation for dynamic-multithreading platformsCharles E. Leiserson, Tao B. Schardl, Jim Sukha. 193-204 [doi]
- Scalable parallel minimum spanning forest computationSadegh Nobari, Thanh-Tung Cao, Panagiotis Karras, Stéphane Bressan. 205-214 [doi]
- GKLEE: concolic verification and test generation for GPUsGuodong Li, Peng Li, Geoffrey Sawaya, Ganesh Gopalakrishnan, Indradeep Ghosh, Sreeranga P. Rajan. 215-224 [doi]
- Algorithm-based fault tolerance for dense matrix factorizationsPeng Du, Aurelien Bouteiller, George Bosilca, Thomas Hérault, Jack Dongarra. 225-234 [doi]
- Efficient deadlock avoidance for streaming computation with filteringJeremy D. Buhler, Kunal Agrawal, Peng Li, Roger D. Chamberlain. 235-246 [doi]
- Lock cohorting: a general technique for designing NUMA locksDavid Dice, Virendra J. Marathe, Nir Shavit. 247-256 [doi]
- Revisiting the combining synchronization techniquePanagiota Fatourou, Nikolaos D. Kallimanis. 257-266 [doi]
- A work-stealing scheduler for X10's task parallelism with suspensionOlivier Tardieu, Haichuan Wang, Haibo Lin. 267-276 [doi]
- Automatic communication optimizations through memory reuse strategiesMuthu Manikandan Baskaran, Nicolas Vasilache, Benoît Meister, Richard Lethin. 277-278 [doi]
- FlexBFS: a parallelism-aware implementation of breadth-first search on GPUGu Liu, Hong An, Wenting Han, Xiaoqiang Li, Tao Sun, Wei Zhou, Xuechao Wei, Xulong Tang. 279-280 [doi]
- Programming parallel embedded and consumer applications in OpenMP superscalarMichael Andersch, Chi Ching Chi, Ben H. H. Juurlink. 281-282 [doi]
- An overview of Medusa: simplified graph processing on GPUsJianlong Zhong, Bingsheng He. 283-284 [doi]
- Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGAChristophe Alias, Alain Darte, Alexandru Plesco. 285-286 [doi]
- Using GPU's to accelerate stencil-based computation kernels for the development of large scale scientific applications on heterogeneous systemsJian Tao, Marek Blazewicz, Steven R. Brandt. 287-288 [doi]
- Mechanizing the expert dense linear algebra developerBryan Marker, Andy Terrel, Jack Poulson, Don S. Batory, Robert A. van de Geijn. 289-290 [doi]
- The boat hull model: adapting the roofline model to enable performance prediction for parallel computingCedric Nugteren, Henk Corporaal. 291-292 [doi]
- Speculative parallelization on GPGPUsMin Feng, Rajiv Gupta, Laxmi N. Bhuyan. 293-294 [doi]
- Adapting the polyhedral model as a framework for efficient speculative parallelizationAlexandra Jimborean, Philippe Clauss, Benoît Pradelle, Luis Mastrangelo, Vincent Loechner. 295-296 [doi]
- An overview of CMPI: network performance aware MPI in the cloudYifan Gong, Bingsheng He, Jianlong Zhong. 297-298 [doi]
- OpenCL as a unified programming model for heterogeneous CPU/GPU clustersJungwon Kim, Sangmin Seo, Jun Lee, Jeongho Nah, Gangwon Jo, Jaejin Lee. 299-300 [doi]
- BDDT: : block-level dynamic dependence analysis for deterministic task-based parallelismGeorge Tzenakis, Angelos Papatriantafyllou, John Kesapides, Polyvios Pratikakis, Hans Vandierendonck, Dimitrios S. Nikolopoulos. 301-302 [doi]
- Portable parallel performance from sequential, productive, embedded domain-specific languagesShoaib Kamil, Derrick Coetzee, Scott Beamer, Henry Cook, Ekaterina Gonina, Jonathan Harper, Jeffrey Morlan, Armando Fox. 303-304 [doi]
- Communication-centric optimizations by dynamically detecting collective operationsTorsten Hoefler, Timo Schneider. 305-306 [doi]
- LHlf: lock-free linear hashing (poster paper)Donghui Zhang, Per-Åke Larson. 307-308 [doi]
- Wait-free linked-listsShahar Timnat, Anastasia Braginsky, Alex Kogan, Erez Petrank. 309-310 [doi]
- Scalable parallel debugging with statistical assertionsMinh Ngoc Dinh, David Abramson, Chao Jin, Andrew Gontarek, Bob Moench, Luiz De Rose. 311-312 [doi]
- Verification of software barriersAlexander Malkis, Anindya Banerjee. 313-314 [doi]
- Collective algorithms for sub-communicatorsAnshul Mittal, Nikhil Jain, Thomas George, Yogish Sabharwal, Sameer Kumar 0001. 315-316 [doi]
- Synchronization views for event-loop actorsJoeri De Koster, Stefan Marr, Theo D'Hondt. 317-318 [doi]
- CPHASH: a cache-partitioned hash tableZviad Metreveli, Nickolai Zeldovich, M. Frans Kaashoek. 319-320 [doi]
- RACECAR: a heuristic for automatic function specialization on multi-core heterogeneous systemsJohn Robert Wernsing, Greg Stitt. 321-322 [doi]
- A lock-free, array-based priority queueYujie Liu, Michael F. Spear. 323-324 [doi]
- An infrastructure for dynamic optimization of parallel programsAlbert Noll, Thomas R. Gross. 325-326 [doi]
- Automatic datatype generation and optimizationFredrik Kjolstad, Torsten Hoefler, Marc Snir. 327-328 [doi]
- NDetermin: inferring nondeterministic sequential specifications for parallelism correctnessJacob Burnim, Tayfun Elmas, George C. Necula, Koushik Sen. 329-330 [doi]
- Concurrent breakpointsChang-Seo Park, Koushik Sen. 331-332 [doi]
- Establishing a Miniapp as a programmability proxyAndrew Stone, John Dennis, Michelle Strout. 333-334 [doi]
- OpenMP-style parallelism in data-centered multicore computing with RLei Jiang, Pragneshkumar B. Patel, George Ostrouchov, Ferdinand Jamitzky. 335-336 [doi]
- Performance analysis of parallel constraint-based local searchYves Caniou, Daniel Diaz, Florian Richoux, Philippe Codognet, Salvador Abreu. 337-338 [doi]