Abstract is missing.
- Scalability-Centric HPC System DesignYutong Lu. 3 [doi]
- Cost-Optimal Execution of Boolean Query Trees with Shared StreamsHenri Casanova, Lipyeow Lim, Yves Robert, Frédéric Vivien, Dounia Zaidouni. 7-16 [doi]
- It's About Time: On Optimal Virtual Network Embeddings under Temporal FlexibilitiesMatthias Rost, Stefan Schmid, Anja Feldmann. 17-26 [doi]
- Exploiting Geometric Partitioning in Task Mapping for Parallel ComputersMehmet Deveci, Sivasankaran Rajamanickam, Vitus J. Leung, Kevin Pedretti, Stephen L. Olivier, David P. Bunde, Ümit V. Çatalyürek, Karen D. Devine. 27-36 [doi]
- Communication-Efficient Distributed Variance Monitoring and Outlier Detection for Multivariate Time SeriesMoshe Gabel, Assaf Schuster, Daniel Keren. 37-47 [doi]
- MobiStreams: A Reliable Distributed Stream Processing System for Mobile DevicesHuayong Wang, Li-Shiuan Peh. 51-60 [doi]
- MapReuse: Reusing Computation in an In-Memory MapReduce SystemDevesh Tiwari, Yan Solihin. 61-71 [doi]
- PAGE: A Framework for Easy PArallelization of GEnomic ApplicationsMucahid Kutlu, Gagan Agrawal. 72-81 [doi]
- Pythia: Faster Big Data in Motion through Predictive Software-Defined Network Optimization at RuntimeMarcelo Veiga Neves, César A. F. De Rose, Kostas Katrinis, Hubertus Franke. 82-90 [doi]
- A Case for a Flexible Scalar Unit in SIMT ArchitectureYi Yang, Ping Xiang, Michael Mantor, Norman Rubin, Lisa R. Hsu, Qunfeng Dong, Huiyang Zhou. 93-102 [doi]
- Scalar Waving: Improving the Efficiency of SIMD Execution on GPUsAyse Yilmazer, Zhongliang Chen, David R. Kaeli. 103-112 [doi]
- Power and Performance Characterization and Modeling of GPU-Accelerated SystemsYuki Abe, Hiroshi Sasaki, Shinpei Kato, Koji Inoue, Masato Edahiro, Martin Peres. 113-122 [doi]
- Energy Efficient HPC on Embedded SoCs: Optimization Techniques for Mali GPUIvan Grasso, Petar Radojkovic, Nikola Rajovic, Isaac Gelado, Alex Ramírez. 123-132 [doi]
- Bursting the Cloud Data Bubble: Towards Transparent Storage Elasticity in IaaS CloudsBogdan Nicolae, Pierre Riteau, Kate Keahey. 135-144 [doi]
- Scibox: Online Sharing of Scientific Data via the CloudJian Huang, Xuechen Zhang, Greg Eisenhauer, Karsten Schwan, Matthew Wolf, Stéphane Ethier, Scott Klasky. 145-154 [doi]
- CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application CoordinationMatthieu Dorier, Gabriel Antoniu, Robert B. Ross, Dries Kimpe, Shadi Ibrahim. 155-164 [doi]
- Active Measurement of the Impact of Network Switch Utilization on Application PerformanceMarc Casas, Greg Bronevetsky. 165-174 [doi]
- Multi-resource Real-Time Reader/Writer Locks for MultiprocessorsBryan C. Ward, James H. Anderson. 177-186 [doi]
- Remote Invalidation: Optimizing the Critical Path of Memory TransactionsAhmed Hassan, Roberto Palmieri, Binoy Ravindran. 187-197 [doi]
- Revisiting Asynchronous Linear Solvers: Provable Convergence Rate through RandomizationHaim Avron, Alex Druinsky, A. Gupta. 198-207 [doi]
- Accelerating MPI Collective Communications through Hierarchical Algorithms Without Sacrificing Inter-Node Communication FlexibilityBenjamin S. Parsons, Vijay S. Pai. 208-218 [doi]
- Enabling In-Situ Data Analysis for Large Protein-Folding Trajectory DatasetsBoyu Zhang, Trilce Estrada, Pietro Cicotti, Michela Taufer. 221-230 [doi]
- Overcoming the Limitations Posed by TCR-beta Repertoire Modeling through a GPU-Based In-Silico DNA Recombination AlgorithmGregory M. Striemer, Harsha Krovi, Ali Akoglu, Benjamin Vincent, Ben Hopson, Jeffrey Frelinger, Adam Buntzman. 231-240 [doi]
- Parallel Mutual Information Based Construction of Whole-Genome Networks on the Intel (R) Xeon Phi (TM) CoprocessorSanchit Misra, Kiran Pamnany, Srinivas Aluru. 241-250 [doi]
- cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPUJing Zhang, Hao Wang, Heshan Lin, Wu-chun Feng. 251-260 [doi]
- Skywalk: A Topology for HPC Networks with Low-Delay SwitchesIkki Fujiwara, Michihiro Koibuchi, Hiroki Matsutani, Henri Casanova. 263-272 [doi]
- LFTI: A New Performance Metric for Assessing Interconnect Designs for Extreme-Scale HPC SystemsXin Yuan, Santosh Mahapatra, Michael Lang 0003, Scott Pakin. 273-282 [doi]
- An Improved Router Design for Reliable On-Chip NetworksPavan Poluri, Ahmed Louri. 283-292 [doi]
- Energy-Efficient Time-Division Multiplexed Hybrid-Switched NoC for Heterogeneous Multicore SystemsJieming Yin, Pingqiang Zhou, Sachin S. Sapatnekar, Antonia Zhai. 293-303 [doi]
- Heterogeneity-Aware Workload Placement and Migration in Distributed Sustainable DatacentersDazhao Cheng, Changjun Jiang, Xiaobo Zhou. 307-316 [doi]
- Online Server and Workload Management for Joint Optimization of Electricity Cost and Carbon Footprint Across Data CentersZahra Abbasi, Madhurima Pore, Sandeep K. S. Gupta. 317-326 [doi]
- Cost-Efficient and Resilient Job Life-Cycle Management on Hybrid CloudsHsuan-Yi Chu, Yogesh Simmhan. 327-336 [doi]
- A Coprocessor Sharing-Aware Scheduler for Xeon Phi-Based Compute ClustersGiuseppe Coviello, Srihari Cadambi, Srimat T. Chakradhar. 337-346 [doi]
- Work-Efficient Parallel GPU Methods for Single-Source Shortest PathsAndrew A. Davidson, Sean Baxter, Michael Garland, John D. Owens. 349-359 [doi]
- Efficient Multi-GPU Computation of All-Pairs Shortest PathsHristo Djidjev, Sunil Thulasidasan, Guillaume Chapuis, Rumen Andonov, Dominique Lavenier. 360-369 [doi]
- An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular DataWeifeng Liu 0002, Brian Vinter. 370-381 [doi]
- Improving the Performance of CA-GMRES on Multicores with Multiple GPUsIchitaro Yamazaki, Hartwig Anzt, Stanimire Tomov, Mark Hoemmen, Jack J. Dongarra. 382-391 [doi]
- How Well Do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and AnalysisYong Guo, Marcin Biczak, Ana Lucia Varbanescu, Alexandru Iosup, Claudio Martella, Theodore L. Willke. 395-404 [doi]
- Complex Network Analysis Using Parallel Approximate Motif CountingGeorge M. Slota, Kamesh Madduri. 405-414 [doi]
- Finding Motifs in Biological Sequences Using the Micron Automata ProcessorIndranil Roy, Srinivas Aluru. 415-424 [doi]
- Traversing Trillions of Edges in Real Time: Graph Exploration on Large-Scale Parallel MachinesFabio Checconi, Fabrizio Petrini. 425-434 [doi]
- TBPoint: Reducing Simulation Time for Large-Scale GPGPU KernelsJen-Cheng Huang, Lifeng Nai, Hyesoon Kim, Hsien-Hsin S. Lee. 437-446 [doi]
- Algorithmic Time, Energy, and Power on Candidate HPC Compute Building BlocksJee Choi, Marat Dukhan, Xing Liu, Richard W. Vuduc. 447-457 [doi]
- Characterization of Impact of Transient Faults and Detection of Data Corruption Errors in Large-Scale N-Body Programs Using Graphics Processing UnitsKeun Soo Yim. 458-467 [doi]
- Analytically Modeling Application Execution for Software-Hardware Co-designJichi Guo, Jiayuan Meng, Qing Yi, Vitali A. Morozov, Kalyan Kumaran. 468-477 [doi]
- Interactive Program Debugging and Optimization for Directive-Based, Efficient GPU ComputingSeyong Lee, Dong Li, Jeffrey S. Vetter. 481-490 [doi]
- Unified Development for Mixed Multi-GPU and Multi-coprocessor Environments Using a Lightweight Runtime EnvironmentAzzam Haidar, Chongxiao Cao, Asim YarKhan, Piotr Luszczek, Stanimire Tomov, Khairul Kabir, Jack Dongarra. 491-500 [doi]
- Nitro: A Framework for Adaptive Code Variant TuningSaurav Muralidharan, Manu Shantharam, Mary W. Hall, Michael Garland, Bryan C. Catanzaro. 501-512 [doi]
- Reading the Tea-Leaves: How Architecture Has Evolved at the High EndPeter M. Kogge. 515 [doi]
- New Effective Multithreaded Matching AlgorithmsFredrik Manne, Mahantesh Halappanavar. 519-528 [doi]
- A Medium-Grain Method for Fast 2D Bipartitioning of Sparse MatricesDaniel M. Pelt, Rob H. Bisseling. 529-539 [doi]
- Bipartite Matching Heuristics with Quality Guarantees on Shared Memory Parallel ComputersFanny Dufossé, Kamer Kaya, Bora Uçar. 540-549 [doi]
- BFS and Coloring-Based Parallel Algorithms for Strongly Connected Components and Related ProblemsGeorge M. Slota, Sivasankaran Rajamanickam, Kamesh Madduri. 550-559 [doi]
- Large-Scale Hydrodynamic Brownian Simulations on Multicore and Manycore ArchitecturesXing Liu, Edmond Chow. 563-572 [doi]
- Using Load Balancing to Scalably Parallelize Sampling-Based Motion Planning AlgorithmsAdam Fidel, Sam Ade Jacobs, Shishir Sharma, Nancy M. Amato, Lawrence Rauchwerger. 573-582 [doi]
- Petascale Application of a Coupled CPU-GPU Algorithm for Simulation and Analysis of Multiphase Flow Solutions in Porous Medium SystemsJames E. McClure, Hao Wang, Jan F. Prins, Cass T. Miller, Wu-chun Feng. 583-592 [doi]
- A Spatio-temporal Coupling Method to Reduce the Time-to-Solution of Cardiovascular SimulationsAmanda Peters Randles, Efthimios Kaxiras. 593-602 [doi]
- Mitigating the Mismatch between the Coherence Protocol and Conflict Detection in Hardware Transactional MemoryLihang Zhao, Lizhong Chen, Jeffrey T. Draper. 605-614 [doi]
- Performance and Energy Analysis of the Restricted Transactional Memory Implementation on HaswellBhavishya Goel, J. Rubén Titos Gil, Anurag Negi, Sally A. McKee, Per Stenström. 615-624 [doi]
- Runtime-Guided Cache Coherence Optimizations in Multi-core ArchitecturesMadhavan Manivannan, Per Stenström. 625-636 [doi]
- High Performance Alltoall and Allgather Designs for InfiniBand MIC ClustersAkshay Venkatesh, Sreeram Potluri, Raghunath Rajachandrasekar, Miao Luo, Khaled Hamidouche, Dhabaleswar K. Panda. 637-646 [doi]
- HPMMAP: Lightweight Memory Management for Commodity Operating SystemsBrian Kocoloski, John R. Lange. 649-658 [doi]
- Victim Selection and Distributed Work Stealing Performance: A Case StudySwann Perarnau, Mitsuhisa Sato. 659-668 [doi]
- Power-Efficient Multiple Producer-ConsumerRamy Medhat, Borzoo Bonakdarpour, Sebastian Fischmeister. 669-678 [doi]
- Efficient Data Race Detection for C/C++ Programs Using Dynamic GranularityYoung Wn Song, Yann-Hang Lee. 679-688 [doi]
- Improved Time Bounds for Linearizable Implementations of Abstract Data TypesJiaqi Wang, Edward Talmage, Hyunyoung Lee, Jennifer L. Welch. 691-701 [doi]
- DEX: Self-Healing ExpandersGopal Pandurangan, Peter Robinson, Amitabh Trehan. 702-711 [doi]
- Fair Maximal Independent SetsJeremy T. Fineman, Calvin C. Newport, Micah Sherr, Tonghe Wang. 712-721 [doi]
- Balancing CPU-GPU Collaborative High-Order CFD Simulations on the Tianhe-1A SupercomputerChuanfu Xu, Lilun Zhang, Xiaogang Deng, Jianbin Fang, Guangxue Wang, Wei Cao, Yonggang Che, Yongxian Wang, Wei Liu. 725-734 [doi]
- Shedding Light on Lithium/Air Batteries Using Millions of Threads on the BG/Q SupercomputerValery Weber, Costas Bekas, Teodoro Laino, Alessandro Curioni, Adam Bertsch, Scott Futral. 735-744 [doi]
- Enabling and Scaling a Global Shallow-Water Atmospheric Model on Tianhe-2Wei Xue, Chao Yang, Haohuan Fu, Xinliang Wang, Yangtong Xu, Lin Gan, Yutong Lu, Xiaoqian Zhu. 745-754 [doi]
- Overcoming the Scalability Challenges of Epidemic Simulations on Blue WatersJae-Seung Yeom, Abhinav Bhatele, Keith R. Bisset, Eric J. Bohm, Abhishek Gupta, Laxmikant V. Kalé, Madhav V. Marathe, Dimitrios S. Nikolopoulos, Martin Schulz, Lukasz Wesolowski. 755-764 [doi]
- POD: Performance Oriented I/O Deduplication for Primary Storage Systems in the CloudBo Mao, Hong Jiang, Suzhen Wu, Lei Tian. 767-776 [doi]
- Pipelined Compaction for the LSM-TreeZigang Zhang, Yinliang Yue, Bingsheng He, Jin Xiong, Mingyu Chen, Lixin Zhang, Ninghui Sun. 777-786 [doi]
- EDM: An Endurance-Aware Data Migration Scheme for Load Balancing in SSD Storage ClustersJiaxin Ou, Jiwu Shu, Youyou Lu, Letian Yi, Wei Wang. 787-796 [doi]
- Characterization and Optimization of Memory-Resident MapReduce on HPC SystemsYandong Wang, Robin Goldstone, Weikuan Yu, Teng Wang. 799-808 [doi]
- MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core ArchitecturesYang You, Shuaiwen Leon Song, Haohuan Fu, Andres Marquez, Maryam Mehri Dehnavi, Kevin J. Barker, Kirk W. Cameron, Amanda Peters Randles, Guangwen Yang. 809-818 [doi]
- BigKernel - High Performance CPU-GPU Communication Pipelining for Big Data-Style ApplicationsReza Mokhtari, Michael Stumm. 819-828 [doi]
- DataMPI: Extending MPI to Hadoop-Like Big Data ComputingXiaoyi Lu, Fan Liang, Bing Wang, Li Zha, Zhiwei Xu. 829-838 [doi]
- An Efficient Method for Stream Semantics over RDMAPatrick MacArthur, Robert D. Russell. 841-851 [doi]
- Collaborative Network Configuration in Hybrid Electrical/Optical Data Center NetworksZhiyang Guo, Yuanyuan Yang. 852-861 [doi]
- Optimizing Bandwidth Allocation in Flex-Grid Optical Networks with Application to SchedulingHadas Shachnai, Ariella Voloshin, Shmuel Zaks. 862-871 [doi]
- Balancing On-Chip Network Latency in Multi-application Mapping for Chip-MultiprocessorsDi Zhu, Lizhong Chen, Siyu Yue, Timothy Mark Pinkston, Massoud Pedram. 872-881 [doi]
- Astrophysical Applications of Machine Learning at Scale and under DuressJoshua Bloom. 885 [doi]
- Scalable Single Source Shortest Path Algorithms for Massively Parallel SystemsVenkatesan T. Chakaravarthy, Fabio Checconi, Fabrizio Petrini, Yogish Sabharwal. 889-901 [doi]
- A New Scalable Parallel Algorithm for Fock Matrix ConstructionXing Liu, Aftab Patel, Edmond Chow. 902-914 [doi]
- ReDHiP: Recalibrating Deep Hierarchy Prediction for Energy EfficiencyXun Li 0001, Diana Franklin, Ricardo Bianchini, Frederic T. Chong. 915-926 [doi]
- F2C2-STM: Flux-Based Feedback-Driven Concurrency Control for STMsKaushik Ravichandran, Santosh Pande. 927-938 [doi]
- Identifying Code Phases Using Piece-Wise Linear RegressionsHarald Servat, Germán Llort, Juan Gonzalez, Judit Gimenez, Jesús Labarta. 941-951 [doi]
- Auto-Tuning Dedispersion for Many-Core AcceleratorsAlessio Sclocco, Henri E. Bal, Jason Hessels, Joeri van Leeuwen, Rob van Nieuwpoort. 952-961 [doi]
- RCMP: Enabling Efficient Recomputation Based Failure Resilience for Big Data AnalyticsFlorin Dinu, T. S. Eugene Ng. 962-971 [doi]
- A Step towards Energy Efficient Computing: Redesigning a Hydrodynamic Application on CPU-GPUTingxing Dong, Veselin Dobrev, Tzanio V. Kolev, Robert N. Rieben, Stanimire Tomov, Jack J. Dongarra. 972-981 [doi]
- Using Multiple Threads to Accelerate Single Thread PerformanceZehra Sura, Kevin O'Brien, José R. Brunheroto. 985-994 [doi]
- Active Measurement of Memory Resource ConsumptionMarc Casas, Greg Bronevetsky. 995-1004 [doi]
- Locating Parallelization Potential in Object-Oriented Data StructuresKorbinian Molitorisz, Thomas Karcher, Alexander Biele, Walter F. Tichy. 1005-1015 [doi]
- An Accelerated Recursive Doubling Algorithm for Block Tridiagonal SystemsSudip K. Seal. 1019-1028 [doi]
- Designing LU-QR Hybrid Solvers for Performance and StabilityMathieu Faverge, Julien Herrmann, Julien Langou, Bradley R. Lowery, Yves Robert, Jack Dongarra. 1029-1038 [doi]
- Effectively Exploiting Parallel Scale for All Problem Sizes in LU FactorizationMd Rakib Hasan, R. Clint Whaley. 1039-1048 [doi]
- Anatomy of High-Performance Many-Threaded Matrix MultiplicationTyler M. Smith, Robert A. van de Geijn, Mikhail Smelyanskiy, Jeff R. Hammond, Field G. Van Zee. 1049-1059 [doi]
- Comparative Performance Analysis of Intel (R) Xeon Phi (TM), GPU, and CPU: A Case Study from Microscopy Image AnalysisGeorge Teodoro, Tahsin M. Kurç, Jun Kong, Lee A. D. Cooper, Joel H. Saltz. 1063-1072 [doi]
- A Framework for Lattice QCD Calculations on GPUsF. T. Winter, M. A. Clark, R. G. Edwards, B. Joó. 1073-1082 [doi]
- Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor ClustersKarthikeyan Vaidyanathan, Kiran Pamnany, Dhiraj D. Kalamkar, Alexander Heinecke, Mikhail Smelyanskiy, JongSoo Park, Daehyun Kim, Aniruddha G. Shet, Bharat Kaul, Bálint Joó, Pradeep Dubey. 1083-1092 [doi]
- Computational Co-design of a Multiscale Plasma Application: A Process and Initial ResultsJoshua Payne, Dana A. Knoll, Allen McPherson, William T. Taitano, Luis Chacón, Guangye Chen, Scott Pakin. 1093-1102 [doi]
- UPC++: A PGAS Extension for C++Yili Zheng, Amir Kamil, Michael B. Driscoll, Hongzhang Shan, Katherine A. Yelick. 1105-1114 [doi]
- An Evaluation of One-Sided and Two-Sided Communication Paradigms on Relaxed-Ordering InterconnectKhaled Z. Ibrahim, Paul Hargrove, Costin Iancu, Katherine A. Yelick. 1115-1125 [doi]
- Scaling Irregular Applications through Data Aggregation and Software MultithreadingAlessandro Morari, Antonino Tumeo, Daniel G. Chavarría-Miranda, Oreste Villa, Mateo Valero. 1126-1135 [doi]
- Generalizing Run-Time Tiling with the Loop Chain AbstractionMichelle Mills Strout, Fabio Luporini, Christopher D. Krieger, Carlo Bertolli, Gheorghe-Teodor Bercea, Catherine Olschanowsky, J. Ramanujam, Paul H. J. Kelly. 1136-1145 [doi]
- s-Step Krylov Subspace Methods as Bottom Solvers for Geometric MultigridSamuel Williams, Mike Lijewski, Ann S. Almgren, Brian van Straalen, Erin Carson, Nicholas Knight, James Demmel. 1149-1158 [doi]
- Reconstructing Householder Vectors from Tall-Skinny QRGrey Ballard, James Demmel, Laura Grigori, Mathias Jacquelin, Hong Diep Nguyen, Edgar Solomonik. 1159-1170 [doi]
- Petascale General Solver for Semidefinite Programming Problems with Over Two Million ConstraintsKatsuki Fujisawa, Toshio Endo, Yuichiro Yasui, Hitoshi Sato, Naoki Matsuzawa, Satoshi Matsuoka, Hayato Waki. 1171-1180 [doi]
- Optimization of Multi-level Checkpoint Model for Large Scale HPC ApplicationsSheng Di, Mohamed-Slim Bouguerra, Leonardo Arturo Bautista Gomez, Franck Cappello. 1181-1190 [doi]
- Evaluating the Impact of SDC on the GMRES Iterative SolverJames Elliott, Mark Hoemmen, Frank Mueller. 1193-1202 [doi]
- A Multi-core Parallel Branch-and-Bound Algorithm Using Factorial Number SystemMohand Mezmaz, Rudi Leroy, Nouredine Melab, Daniel Tuyttens. 1203-1212 [doi]
- Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction CalculationsHasan Metin Aktulga, Aydin Buluç, Samuel Williams, Chao Yang. 1213-1222 [doi]
- FMI: Fault Tolerant Messaging Interface for Fast and Transparent RecoveryKento Sato, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis R. de Supinski, Naoya Maruyama, Satoshi Matsuoka. 1225-1234 [doi]
- Designing Bit-Reproducible Portable High-Performance ApplicationsAndrea Arteaga, Oliver Fuhrer, Torsten Hoefler. 1235-1244 [doi]
- F-SEFI: A Fine-Grained Soft Error Fault Injection Tool for Profiling Application VulnerabilityQiang Guan, Nathan DeBardeleben, Sean Blanchard, Song Fu. 1245-1254 [doi]