Abstract is missing.
- Large-scale visual data analysisChris Johnson. 1 [doi]
- A Predictive Model for Solving Small Linear Algebra Problems in GPU RegistersMichael J. Anderson, David Sheffield, Kurt Keutzer. 2-13 [doi]
- A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore ArchitecturesMarc Baboulin, Dulceneia Becker, Jack Dongarra. 14-24 [doi]
- A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal ReductionAzzam Haidar, Hatem Ltaief, Piotr Luszczek, Jack Dongarra. 25-35 [doi]
- Improving the Performance of Dynamical Simulations Via Multiple Right-Hand SidesXing Liu, Edmond Chow, Karthikeyan Vaidyanathan, Mikhail Smelyanskiy. 36-47 [doi]
- High-Performance Interaction-Based Simulation of Gut Immunopathologies with ENteric Immunity Simulator (ENISI)Keith R. Bisset, Md. Maksudul Alam, Josep Bassaganya-Riera, Adria Carbo, Stephen Eubank, Raquel Hontecillas, Stefan Hoops, Yongguo Mei, Katherine V. Wendelsdorf, Dawen Xie, Jae-Seung Yeom, Madhav V. Marathe. 48-59 [doi]
- A Parallel Algorithm for Spectrum-based Short Read Error CorrectionAnkit Shah, Sriram P. Chockalingam, Srinivas Aluru. 60-70 [doi]
- Enhancing the Scalability of Consistency-based Progressive Multiple Sequences Alignment ApplicationsMiquel Orobitg, Fernando Cores, Fernando Guirado, Carsten Kemena, Cédric Notredame, Ana Ripoll. 71-82 [doi]
- An Accurate GPU Performance Model for Effective Control Flow Divergence OptimizationZheng Cui, Yun Liang, Kyle Rupnow, Deming Chen. 83-94 [doi]
- SEL-TM: Selective Eager-Lazy Management for Improved Concurrency in Transactional MemoryLihang Zhao, Woojin Choi, Jeff Draper. 95-106 [doi]
- Robust SIMD: Dynamically Adapted SIMD Width and Multi-Threading DepthJiayuan Meng, Jeremy W. Sheaffer, Kevin Skadron. 107-118 [doi]
- Dynamic Operands Insertion for VLIW Architecture with a Reduced Bit-width Instruction SetJongwon Lee, Jonghee M. Youn, Jihoon Lee, Minwook Ahn, Yunheung Paek. 119-130 [doi]
- SUV: A Novel Single-Update Version-Management Scheme for Hardware Transactional Memory SystemsZhichao Yan, Hong Jiang, Dan Feng, Lei Tian, Yujuan Tan. 131-143 [doi]
- Heterogeneous Task Scheduling for Accelerated OpenMPThomas Scogland, Barry Rountree, Wu-chun Feng, Bronis R. de Supinski. 144-155 [doi]
- A Source-aware Interrupt Scheduling for Modern Parallel I/O SystemsHongbo Zou, Xian-He Sun, Siyuan Ma, Xi Duan. 156-166 [doi]
- ExPERT: Pareto-Efficient Task Replication on Grids and a CloudOrna Agmon Ben-Yehuda, Assaf Schuster, Artyom Sharov, Mark Silberstein, Alexandru Iosup. 167-178 [doi]
- Scheduling Closed-Nested Transactions in Distributed Transactional MemoryJunwhan Kim, Binoy Ravindran. 179-188 [doi]
- Power-aware Manhattan Routing on Chip MultiprocessorsAnne Benoit, Rami G. Melhem, Paul Renaud-Goud, Yves Robert. 189-200 [doi]
- Efficient Resource Oblivious Algorithms for Multicores with False SharingRichard Cole, Vijaya Ramachandran. 201-214 [doi]
- Competitive Cache Replacement Strategies for Shared Cache EnvironmentsAnil Kumar Katti, Vijaya Ramachandran. 215-226 [doi]
- A Novel Sorting Algorithm for Many-core Architectures Based on Adaptive Bitonic SortHagen Peters, Ole Schulz-Hildebrandt, Norbert Luttenberger. 227-237 [doi]
- Optimizing Busy Time on Parallel MachinesGeorge B. Mertzios, Mordechai Shalom, Ariella Voloshin, Prudence W. H. Wong, Shmuel Zaks. 238-248 [doi]
- WATS: Workload-Aware Task Scheduling in Asymmetric Multi-core ArchitecturesQuan Chen, Yawen Chen, Zhiyi Huang 0001, Minyi Guo. 249-260 [doi]
- Parametric Utilization Bounds for Fixed-Priority Multiprocessor SchedulingNan Guan, Martin Stigge, Wang Yi 0001, Ge Yu. 261-272 [doi]
- Minimizing Weighted Mean Completion Time for Malleable Tasks SchedulingOlivier Beaumont, Nicolas Bonichon, Lionel Eyraud-Dubois, Loris Marchal. 273-284 [doi]
- Load Balancing of Dynamical Nucleation Theory Monte Carlo Simulations through Resource Sharing BarriersHumayun Arafat, P. Sadayappan, James Dinan, Sriram Krishnamoorthy, Theresa L. Windus. 285-295 [doi]
- Highly Efficient Performance Portable Tracking of Evolving SurfacesWei Yu, Franz Franchetti, James C. Hoe, Tsuhan Chen. 296-307 [doi]
- Advancing Large Scale Many-Body QMC Simulations on GPU Accelerated Multicore SystemsAndrés Tomás, Chia-Chen Chang, Richard Scalettar, Zhaojun Bai. 308-319 [doi]
- Reducing Data Movement Costs: Scalable Seismic Imaging on Blue GeneMichael Perrone, Lurng-Kuo Liu, Ligang Lu, Karen A. Magerlein, Changhoan Kim, Irina Fedulova, Artyom Semenikhin. 320-329 [doi]
- Opportunistic Data-driven Execution of Parallel Programs for Efficient I/O ServicesXuechen Zhang, Kei Davis, Song Jiang. 330-341 [doi]
- SyncChecker: Detecting Synchronization Errors between MPI Applications and LibrariesZhezhe Chen, Xinyu Li, Jau-Yuan Chen, Hua Zhong, Feng Qin. 342-353 [doi]
- Holistic Debugging of MPI Derived DatatypesJoachim Protze, Tobias Hilbrich, Andreas Knüpfer, Bronis R. de Supinski, Matthias S. Müller. 354-365 [doi]
- Hierarchical Local Storage: Exploiting Flexible User-Data Sharing Between MPI TasksMarc Tchiboukdjian, Patrick Carribault, Marc Pérache. 366-377 [doi]
- Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node EfficiencyJatin Chhugani, Nadathur Satish, Changkyu Kim, Jason Sewall, Pradeep Dubey. 378-389 [doi]
- SAHAD: Subgraph Analysis in Massive Networks Using HadoopZhao Zhao, Guanying Wang, Ali Raza Butt, Maleq Khan, V. S. Anil Kumar, Madhav V. Marathe. 390-401 [doi]
- Accelerating Nearest Neighbor Search on Manycore SystemsLawrence Cayton. 402-413 [doi]
- Optimizing Large-scale Graph Analysis on Multithreaded, Multicore PlatformsGuojing Cong, Konstantin Makarychev. 414-425 [doi]
- Low-Cost Parallel Algorithms for 2: 1 Octree BalanceTobin Isaac, Carsten Burstedde, Omar Ghattas. 426-437 [doi]
- A Case Study of Designing Efficient Algorithm-based Fault Tolerant Application for Exascale ParallelismErlin Yao, Rui Wang, Mingyu Chen, Guangming Tan, Ninghui Sun. 438-448 [doi]
- High Performance Non-uniform FFT on Modern X86-based Multi-core SystemsDhiraj D. Kalamkar, Joshua D. Trzaskoz, Srinivas Sridharan, Mikhail Smelyanskiy, Daehyun Kim, Armando Manduca, Yunhong Shu, Matt A. Bernstein, Bharat Kaul, Pradeep Dubey. 449-460 [doi]
- NUMA Aware Iterative Stencil Computations on Many-Core SystemsMohammed Shaheen, Robert Strzodka. 461-473 [doi]
- Algebraic Block Multi-Color Ordering Method for Parallel Multi-Threaded Sparse Triangular Solver in ICCG MethodTakeshi Iwashita, Hiroshi Nakashima, Yasuhito Takahashi. 474-483 [doi]
- The Parallel Computation of Morse-Smale ComplexesAttila Gyulassy, Valerio Pascucci, Tom Peterka, Robert B. Ross. 484-495 [doi]
- Hybrid Static/dynamic Scheduling for Already Optimized Dense Matrix FactorizationSimplice Donfack, Laura Grigori, William D. Gropp, Vivek Kale. 496-507 [doi]
- Understanding Cache Hierarchy Contention in CMPs to Improve Job SchedulingJosué Feliu, Julio Sahuquillo, Salvador Petit, José Duato. 508-519 [doi]
- Optimization of Parallel Discrete Event Simulator for Multi-core SystemsDeepak Jagtap, Nael B. Abu-Ghazaleh, Dmitry Ponomarev. 520-531 [doi]
- Using the Translation Lookaside Buffer to Map Threads in Parallel Applications Based on Shared MemoryEduardo Henrique Molina da Cruz, Matthias Diener, Philippe Olivier Alexandre Navaux. 532-543 [doi]
- Automatic Resource Scheduling with Latency Hiding for Parallel Stencil Applications on GPGPU ClustersKumiko Maeda, Masana Murase, Munehiro Doi, Hideaki Komatsu, Shigeho Noda, Ryutaro Himeno. 544-556 [doi]
- Productive Programming of GPU Clusters with OmpSsJavier Bueno, Judit Planas, Alejandro Duran, Rosa M. Badia, Xavier Martorell, Eduard Ayguadé, Jesús Labarta. 557-568 [doi]
- Generating Device-specific GPU Code for Local Operators in Medical ImagingRichard Membarth, Frank Hannig, Jürgen Teich, Mario Körner, Wieland Eckert. 569-581 [doi]
- Performance Portability with the Chapel LanguageAlbert Sidelnik, Saeed Maleki, Bradford L. Chamberlain, María Jesús Garzarán, David A. Padua. 582-594 [doi]
- Exascale System Software for the Year of the DragonPete Beckman. 595 [doi]
- Mapping Dense LU Factorization on Multicore Supercomputer NodesJonathan Lifflander, Phil Miller, Ramprasad Venkataraman, Anshu Arya, Laxmikant V. Kalé, Terry Jones. 596-606 [doi]
- Hierarchical QR Factorization Algorithms for Multi-core Cluster SystemsJack Dongarra, Mathieu Faverge, Thomas Hérault, Julien Langou, Yves Robert. 607-618 [doi]
- New Scheduling Strategies and Hybrid Programming for a Parallel Right-looking Sparse LU Factorization Algorithm on Multicore Cluster SystemsIchitaro Yamazaki, Xiaoye S. Li. 619-630 [doi]
- ShyLU: A Hybrid-Hybrid Solver for Multicore PlatformsSivasankaran Rajamanickam, Erik G. Boman, Michael A. Heroux. 631-643 [doi]
- MATE-CG: A Map Reduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous ClustersWei Jiang, Gagan Agrawal. 644-655 [doi]
- Automated and Agile Server Parameter Tuning with Learning and ControlYanfei Guo, Palden Lama, Xiaobo Zhou. 656-667 [doi]
- A Self-tuning Failure Detection Scheme for Cloud Computing ServiceNaixue Xiong, Athanasios V. Vasilakos, Jie Wu 0001, Y. Richard Yang, Andy Rindos, Yuezhi Zhou, Wen-Zhan Song, Yi Pan. 668-679 [doi]
- PGAS for Distributed Numerical Python Targeting Multi-core ClustersMads Ruben Burgdorff Kristensen, Yili Zheng, Brian Vinter. 680-690 [doi]
- Miss-Correlation Folding: Encoding Per-Block Miss Correlations in Compressed DRAM for Data PrefetchingGang Liu, Jih-Kwon Peir, Victor W. Lee. 691-702 [doi]
- On the Role of NVRAM in Data-intensive Architectures: An EvaluationBrian Van Essen, Roger A. Pearce, Sasha Ames, Maya Gokhale. 703-714 [doi]
- iTransformer: Using SSD to Improve Disk Scheduling for High-performance I/OXuechen Zhang, Kei Davis, Song Jiang. 715-726 [doi]
- Switching Optically-Connected Memories in a Large-Scale SystemAbhirup Chakraborty, Eugen Schenfeld, Dilma Da Silva. 727-738 [doi]
- Supporting the Global Arrays PGAS Model Using MPI One-Sided CommunicationJames Dinan, Pavan Balaji, Jeff R. Hammond, Sriram Krishnamoorthy, Vinod Tipparaju. 739-750 [doi]
- A uGNI-based Asynchronous Message-driven Runtime System for Cray Supercomputers with Gemini InterconnectYanhua Sun, Gengbin Zheng, Laximant V. Kalé, Terry R. Jones, Ryan Olson. 751-762 [doi]
- PAMI: A Parallel Active Message Interface for the Blue Gene/Q SupercomputerSameer Kumar 0001, Amith R. Mamidala, Daniel Faraj, Brian E. Smith, Michael Blocksome, Bob Cernohous, Douglas Miller, Jeff Parker, Joseph Ratterman, Philip Heidelberger, Dong Chen, Burkhard D. Steinmacher-Burow. 763-773 [doi]
- High-Performance Design of HBase with RDMA over InfiniBandJian Huang, Xiangyong Ouyang, Jithin Jose, Md. Wasi-ur-Rahman, Hao Wang, Miao Luo, Hari Subramoni, Chet Murthy, Dhabaleswar K. Panda. 774-785 [doi]
- Virtual Machine Resource Allocation for Service Hosting on Heterogeneous Distributed PlatformsMark Stillwell, Frédéric Vivien, Henri Casanova. 786-797 [doi]
- Consistency-aware Partitioning Algorithm in Multi-server Distributed Virtual EnvironmentsYusen Li, Wentong Cai. 798-807 [doi]
- Optimal Resource Rental Planning for Elastic Applications in Cloud MarketHan Zhao, Miao Pan, Xinxin Liu, Xiaolin Li 0001, Yuguang Fang. 808-819 [doi]
- Improved Bounds for Discrete Diffusive Load BalancingClemens P. J. Adolphs, Petra Berenbrink. 820-826 [doi]
- Multi-core Spanning Forest Algorithms using the Disjoint-set Data StructureMd. Mostofa Ali Patwary, Peder Refsnes, Fredrik Manne. 827-835 [doi]
- Graph Partitioning for Reconfigurable TopologyDeepak Ajwani, Shoukat Ali, John P. Morrison. 836-847 [doi]
- Multithreaded Clustering for Multi-level Hypergraph PartitioningÜmit V. Çatalyürek, Mehmet Deveci, Kamer Kaya, Bora Uçar. 848-859 [doi]
- Multithreaded Algorithms for Maxmum Matching in Bipartite GraphsAriful Azad, Mahantesh Halappanavar, Sivasankaran Rajamanickam, Erik G. Boman, Arif M. Khan, Alex Pothen. 860-872 [doi]
- Multi-level Layout Optimization for Efficient Spatio-temporal Queries on ISABELA-compressed DataZhenhuan Gong, Sriram Lakshminarasimhan, John Jenkins, Hemanth Kolla, Stéphane Ethier, Jackie Chen, Robert B. Ross, Scott Klasky, Nagiza F. Samatova. 873-884 [doi]
- Evaluating Mesh-based P2P Video-on-Demand SystemsYingwu Zhu. 885-896 [doi]
- Query Optimization and Execution in a Parallel Analytics DBMSTodd Eavis, Ahmad Taleb. 897-908 [doi]
- Dynamic Message Ordering for Topic-Based Publish/Subscribe SystemsRoberto Baldoni, Silvia Bonomi, Marco Platania, Leonardo Querzoni. 909-920 [doi]
- iHarmonizer: Improving the Disk Efficiency of I/O-intensive Multithreaded CodesYizhe Wang, Kei Davis, Yuehai Xu, Song Jiang. 921-932 [doi]
- Improving Parallel IO Performance of Cell-based AMR Cosmology ApplicationsYongen Yu, Douglas H. Rudd, Zhiling Lan, Nickolay Y. Gnedin, Andrey V. Kravtsov, Jingjin Wu. 933-944 [doi]
- Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific ApplicationsDong Li, Jeffrey S. Vetter, Gabriel Marin, Collin McCurdy, Cristian Cira, Zhuo Liu, Weikuan Yu. 945-956 [doi]
- NVMalloc: Exposing an Aggregate SSD Store as a Memory Partition in Extreme-Scale MachinesChao Wang, Sudharshan S. Vazhkudai, Xiaosong Ma, Fei Meng, Youngjae Kim, Christian Engelmann. 957-968 [doi]
- Building billion-threads computer and elastic processorGuo-Jie Li. 969 [doi]
- HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core ClustersTeng Ma, George Bosilca, Aurelien Bouteiller, Jack Dongarra. 970-982 [doi]
- BRISA: Combining Efficiency and Reliability in Epidemic Data DisseminationMiguel Matos, Valerio Schiavoni, Pascal Felber, Rui Oliveira, Etienne Riviere. 983-994 [doi]
- Locality Principle Revisited: A Probability-Based Quantitative ApproachSaurabh Gupta, Ping Xiang, Yi Yang, Huiyang Zhou. 995-1009 [doi]
- Evaluating the Impact of TLB Misses on Future HPC SystemsAlessandro Morari, Roberto Gioiosa, Robert W. Wisniewski, Bryan S. Rosenburg, Todd Inglett, Mateo Valero. 1010-1021 [doi]
- Optimal Algorithms and Approximation Algorithms for Replica Placement with Distance Constraints in Tree NetworksAnne Benoit, Hubert Larchevêque, Paul Renaud-Goud. 1022-1033 [doi]
- On Nonblocking Multirate Multicast Fat-tree Data Center Networks with Server RedundancyZhiyang Guo, Yuanyuan Yang. 1034-1044 [doi]
- Distributed Transactional Memory for General NetworksGokarna Sharma, Costas Busch, Srinivasagopalan Srivathsan. 1045-1056 [doi]
- On λ-Alert ProblemMarek Klonowski, Dominik Pajak. 1057-1067 [doi]
- Efficient Quality Threshold Clustering for Parallel ArchitecturesAnthony Danalis, Collin McCurdy, Jeffrey S. Vetter. 1068-1079 [doi]
- A Highly Parallel Reuse Distance Analysis Algorithm on GPUsHuimin Cui, Qing Yi, Jingling Xue, Lei Wang, Yang Yang, Xiaobing Feng 0002. 1080-1092 [doi]
- Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped SystemsGeorge Teodoro, Tahsin M. Kurç, Tony Pan, Lee A. D. Cooper, Jun Kong, Patrick M. Widener, Joel H. Saltz. 1093-1104 [doi]
- Radio Astronomy Beam Forming on Many-Core ArchitecturesAlessio Sclocco, Ana Lucia Varbanescu, Jan-David Mol, Rob van Nieuwpoort. 1105-1116 [doi]
- Cross-layer Energy and Performance Evaluation of a Nanophotonic Manycore Processor System Using Real Application WorkloadsGeorge Kurian, Chen Sun, Chia-Hsin Owen Chen, Jason E. Miller, Jürgen Michel, Lan Wei, Dimitri A. Antoniadis, Li-Shiuan Peh, Lionel C. Kimerling, Vladimir Stojanovic, Anant Agarwal. 1117-1130 [doi]
- Exploring the Scope of the InfiniBand Congestion Control MechanismErnst Gunnar Gran, Sven-Arne Reinemo, Olav Lysne, Tor Skeie, Eitan Zahavi, Gilad Shainer. 1131-1143 [doi]
- DCAF - A Directly Connected Arbitration-Free Photonic Crossbar for Energy-Efficient High Performance ComputingChristopher Nitta, Matthew K. Farrens, Venkatesh Akella. 1144-1155 [doi]
- Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient SolversKrishna Chaitanya Kandalla, Ulrike Meier Yang, Jeff Keasler, Tzanio V. Kolev, Adam Moody, Hari Subramoni, Karen Tomko, Jérôme Vienne, Bronis R. de Supinski, Dhabaleswar K. Panda. 1156-1167 [doi]
- Taming of the Shrew: Modeling the Normal and Faulty Behaviour of Large-scale HPC SystemsAna Gainaru, Franck Cappello, William Kramer. 1168-1179 [doi]
- Meteor Shower: A Reliable Stream Processing System for Commodity Data CentersHuayong Wang, Li-Shiuan Peh, Emmanouil Koukoumidis, Shao Tao, Mun Choon Chan. 1180-1191 [doi]
- Hybrid Transactions: Lock Allocation and Assignment for IrrevocabilityJaswanth Sreeram, Santosh Pande. 1192-1203 [doi]
- Profiling-based Adaptive Contention Management for Software Transactional MemoryZhengyu He, Xiao Yu, Bo Hong. 1204-1215 [doi]
- HydEE: Failure Containment without Event Logging for Large Scale Send-Deterministic MPI ApplicationsAmina Guermouche, Thomas Ropars, Marc Snir, Franck Cappello. 1216-1227 [doi]
- Distributed Demand and Response Algorithm for Optimizing Social-Welfare in Smart GridQifen Dong, Li Yu, Wen-Zhan Song, Lang Tong, ShaoJie Tang. 1228-1239 [doi]
- Scalable Distributed Consensus to Support MPI Fault ToleranceDarius Buntinas. 1240-1249 [doi]
- ScalaBenchGen: Auto-Generation of Communication Benchmarks TracesXing Wu, Vivek Deshpande, Frank Mueller. 1250-1260 [doi]
- A Self-Stabilization Process for Small-World NetworksSebastian Kniesburges, Andreas Koutsopoulos, Christian Scheideler. 1261-1271 [doi]
- Self-organizing Particle SystemsMaximilian Drees, Martina Hüllmann, Andreas Koutsopoulos, Christian Scheideler. 1272-1283 [doi]
- PARDA: A Fast Parallel Reuse Distance Analysis AlgorithmQingpeng Niu, James Dinan, Qingda Lu, P. Sadayappan. 1284-1294 [doi]
- A Lower Bound on Proximity Preservation by Space Filling CurvesPan Xu, Srikanta Tirthapura. 1295-1305 [doi]
- Modeling and Analyzing Key Performance Factors of Shared Memory MapReduceDevesh Tiwari, Yan Solihin. 1306-1317 [doi]
- Predicting Potential Speedup of Serial Code via Lightweight Profiling and Emulations with Memory Performance ModelMinjang Kim, Pranith Kumar, Hyesoon Kim, Bevin Brett. 1318-1329 [doi]
- Scalable Critical-Path Based Performance AnalysisDavid Böhme, Felix Wolf, Bronis R. de Supinski, Martin Schulz, Markus Geimer. 1330-1340 [doi]
- FractalMRC: Online Cache Miss Rate Curve Prediction on Commodity SystemsLulu He, Zhibin Yu, Hai Jin. 1341-1351 [doi]
- Enabling In-situ Execution of Coupled Scientific Workflow on Multi-core PlatformFan Zhang, Ciprian Docan, Manish Parashar, Scott Klasky, Norbert Podhorszki, Hasan Abbasi. 1352-1363 [doi]
- GTI: A Generic Tools Infrastructure for Event-Based Tools in Parallel SystemsTobias Hilbrich, Matthias S. Müller, Bronis R. de Supinski, Martin Schulz, Wolfgang E. Nagel. 1364-1375 [doi]
- An Efficient Framework for Multi-dimensional Tuning of High Performance Computing ApplicationsGuojing Cong, Hui-Fang Wen, I-Hsin Chung, David J. Klepacki, Hiroki Murata, Yasushi Negishi. 1376-1387 [doi]
- An SMT-Selection Metric to Improve Multithreaded Applications' PerformanceJustin R. Funston, Kaoutar El Maghraoui, Joefon Jann, Pratap Pattnaik, Alexandra Fedorova. 1388-1399 [doi]