Abstract is missing.
- Towards green aviation with python at petascalePeter E. Vincent, Freddie D. Witherden, Brian C. Vermeire, Jin Seok Park, Arvind Iyer. 1 [doi]
- Modeling dilute solutions using first-principles molecular dynamics: computing more than a million atoms with over a million coresJean-Luc Fattebert, Daniel Osei-Kuffuor, Erik W. Draeger, Tadashi Ogitsu, William D. Krauss. 2 [doi]
- Simulations of below-ground dynamics of fungi: 1.184 pflops attained by automated generation and autotuning of temporal blocking codesTakayuki Muranushi, Hideyuki Hotta, Junichiro Makino, Seiya Nishizawa, Hirofumi Tomita, Keigo Nitadori, Masaki Iwasawa, Natsuki Hosono, Yutaka Maruyama, Hikaru Inoue, Hisashi Yashiro, Yoshifumi Nakamura. 3 [doi]
- Extreme-scale phase field simulations of coarsening dynamics on the sunway taihulight supercomputerJian Zhang, Chunbao Zhou, Yangang Wang, Lili Ju, Qiang Du, Xuebin Chi, Dongsheng Xu, Dexun Chen, Yong Liu, Zhao Liu. 4 [doi]
- A highly effective global surface wave numerical simulation with ultra-high resolutionFang-Li Qiao, Wei Zhao, Xunqiang Yin, Xiaomeng Huang, Xin Liu, Qi Shu, Guansuo Wang, Zhenya Song, Xinfang Li, Haixing Liu, Guangwen Yang, Yeli Yuan. 5 [doi]
- 10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamicsChao Yang, Wei Xue, Haohuan Fu, Hongtao You, Xinliang Wang, Yulong Ao, Fangfang Liu, Lin Gan, Ping Xu, Lanning Wang, Guangwen Yang, Weimin Zheng. 6 [doi]
- The vectorization of the tersoff multi-body potential: an exercise in performance portabilityMarkus Höhnerbach, Ahmed E. Ismail, Paolo Bientinesi. 7 [doi]
- Increasing molecular dynamics simulation rates with an 8-fold increase in electrical power efficiencyW. Michael Brown, Andrey Semin, Michael Hebenstreit, Sergey Khvostov, Karthik Raman, Steven J. Plimpton. 8 [doi]
- Enhanced MPSM3 for applications to quantum biological simulationsA. Pozdneev, Valéry Weber, Teodoro Laino, Constantine Bekas, Alessandro Curioni. 9 [doi]
- Development effort estimation in HPCSandra Wienke, Julian Miller, Martin Schulz, Matthias S. Müller. 10 [doi]
- MetaMorph: a library framework for interoperable kernels on multi- and many-core clustersAhmed E. Helal, Paul Sathre, Wu-chun Feng. 11 [doi]
- Truenorth ecosystem for brain-inspired computing: scalable systems, software, and applicationsJun Sawada, Filipp Akopyan, Andrew S. Cassidy, Brian Taba, Michael V. DeBole, Pallab Datta, Rodrigo Alvarez-Icaza, Arnon Amir, John V. Arthur, Alexander Andreopoulos, Rathinakumar Appuswamy, Heinz Baier, Davis Barch, David J. Berg, Carmelo di Nolfo, Steven K. Esser, Myron Flickner, Thomas A. Horvath, Bryan L. Jackson, Jeff Kusnitz, Scott Lekuch, Michael Mastro, Timothy Melano, Paul A. Merolla, Steven E. Millman, Tapan K. Nayak, Norm Pass, Hartmut E. Penner, William P. Risk, Kai Schleupen, Benjamin Shaw, Hayley Wu, Brian Giera, Adam T. Moody, Nathan Mundhenk, Brian Van Essen, Eric X. Wang, David P. Widemann, Qing Wu, William E. Murphy, Jamie K. Infantolino, James A. Ross, Dale R. Shires, Manuel M. Vindiola, Raju Namburu, Dharmendra S. Modha. 12 [doi]
- Scheduling-aware routing for supercomputersJens Domke, Torsten Hoefler. 13 [doi]
- Evaluating HPC networks via simulation of parallel workloadsNikhil Jain, Abhinav Bhatele, Sam White, Todd Gamblin, Laxmikant V. Kalé. 14 [doi]
- Flexfly: enabling a reconfigurable dragonfly through silicon photonicsKe Wen, Payman Samadi, Sébastien Rumley, Christine P. Chen, Yiwen Shen, Meisam Bahadroi, Keren Bergman, Jeremiah Wilke. 15 [doi]
- PFEAST: a high performance sparse eigenvalue solver using distributed-memory linear solversJames Kestyn, Vasileios Kalantzis, Eric Polizzi, Yousef Saad. 16 [doi]
- Block iterative methods and recycling for improved scalability of linear solversPierre Jolivet, Pierre-Henri Tournier. 17 [doi]
- Scalable non-blocking preconditioned conjugate gradient methodsPaul R. Eller, William Gropp. 18 [doi]
- Pinpointing scale-dependent integer overflow bugs in large-scale parallel applicationsIgnacio Laguna, Martin Schulz. 19 [doi]
- Compiler-directed lightweight checkpointing for fine-grained guaranteed soft error recoveryQingrui Liu, Changhee Jung, Dongyoon Lee, Devesh Tiwari. 20 [doi]
- Understanding error propagation in GPGPU applicationsGuanpeng Li, Karthik Pattabiraman, Chen-Yong Cher, Pradip Bose. 21 [doi]
- Simulation and performance analysis of the ECMWF tape library systemMarkus Mäsker, Lars Nagel, Tim Süß, André Brinkmann, Lennart Sorth. 22 [doi]
- Real-time synthesis of compression algorithms for scientific dataMartin Burtscher, Hari Mukka, Annie Yang, Farbod Hesaaraki. 23 [doi]
- Performance modeling of in situ renderingMatthew Larsen, Cyrus Harrison, James Kress, David Pugmire, Jeremy S. Meredith, Hank Childs. 24 [doi]
- HARP: predictive transfer optimization based on historical analysis and real-time probingEngin Arslan, Kemal Guner, Tevfik Kosar. 25 [doi]
- SERF: efficient scheduling for fast deep neural network serving via judicious parallelismFeng Yan 0001, Yuxiong He, Olatunji Ruwase, Evgenia Smirni. 26 [doi]
- Failure detection and propagation in HPC systemsGeorge Bosilca, Aurelien Bouteiller, Amina Guermouche, Thomas Hérault, Yves Robert, Pierre Sens, Jack J. Dongarra. 27 [doi]
- Improving application resilience to memory errors with lightweight compressionScott Levy, Kurt B. Ferreira, Patrick G. Bridges. 28 [doi]
- FlipBack: automatic targeted protection against silent data corruptionXiang Ni, Laxmikant V. Kalé. 29 [doi]
- Graph colouring as a challenge problem for dynamic graph processing on distributed systemsScott Sallinen, Keita Iwabuchi, Suraj Poudel, Maya Gokhale, Matei Ripeanu, Roger A. Pearce. 30 [doi]
- An exploration of optimization algorithms for high performance tensor completionShaden Smith, JongSoo Park, George Karypis. 31 [doi]
- An efficient and scalable algorithmic method for generating large: scale random graphsMd. Maksudul Alam, Maleq Khan, Anil Vullikanti, Madhav V. Marathe. 32 [doi]
- Understanding performance interference in next-generation HPC systemsOscar H. Mondragon, Patrick G. Bridges, Scott Levy, Kurt B. Ferreira, Patrick M. Widener. 33 [doi]
- Reliable and efficient performance monitoring in linuxMaria Dimakopoulou, Stéphane Eranian, Nectarios Koziris, Nicholas Bambos. 34 [doi]
- Evaluating and optimizing OpenCL kernels for high performance computing with FPGAsHamid Reza Zohouri, Naoya Maruyama, Aaron Smith, Motohiko Matsuda, Satoshi Matsuoka. 35 [doi]
- Enhancing infiniband with openflow-style SDN capabilityJason Lee, Zhou Tong, Karthik Achalkar, Xin Yuan, Michael Lang 0003. 36 [doi]
- Designing MPI library with on-demand paging (ODP) of infiniband: challenges and benefitsMingzhe Li, Khaled Hamidouche, Xiaoyi Lu, Hari Subramoni, Jie Zhang, Dhabaleswar K. Panda. 37 [doi]
- The mont-blanc prototype: an alternative approach for HPC systemsNikola Rajovic, Alejandro Rico, Filippo Mantovani, Daniel Ruiz, Josep Oriol Vilarrubi, Constantino Gomez, Luna Backes, Diego Nieto, Harald Servat, Xavier Martorell, Jesús Labarta, Eduard Ayguadé, Chris Adeniyi-Jones, Said Derradji, Hervé Gloaguen, Piero Lanucara, Nico Sanna, Jean-François Méhaut, Kevin Pouget, Brice Videau, Eric Boyer, Momme Allalen, Axel Auweter, David Brayford, Daniele Tafani, Volker Weinberg, Dirk Brömmel, René Halver, Jan H. Meinke, Ramón Beivide, Mariano Benito, Enrique Vallejo 0001, Mateo Valero, Alex Ramírez. 38 [doi]
- PIPES: a language and compiler for task-based programming on distributed-memory clustersMartin Kong, Louis-Noël Pouchet, P. Sadayappan, Vivek Sarkar. 39 [doi]
- A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environmentSamyam Rajbhandari, Jinsung Kim, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, Robert J. Harrison, P. Sadayappan. 40 [doi]
- Automating wavefront parallelization for sparse matrix computationsAnand Venkat, Mahdi Soltan Mohammadi, JongSoo Park, Hongbo Rong, Rajkishore Barik, Michelle Mills Strout, Mary W. Hall. 41 [doi]
- Granularity and the cost of error recovery in resilient AMR scientific applicationsAnshu Dubey, Hajime Fujita, Daniel T. Graves, Andrew A. Chien, Devesh Tiwari. 42 [doi]
- Extreme scale plasma turbulence simulations on top supercomputers worldwideWilliam M. Tang, Bei Wang, Stéphane Ethier, Grzegorz Kwasniewski, Torsten Hoefler, Khaled Z. Ibrahim, Kamesh Madduri, Samuel Williams, Leonid Oliker, Carlos Rosales-Fernandez, Timothy J. Williams. 43 [doi]
- A parallel arbitrary-order accurate AMR algorithm for the scalar advection-diffusion equationArash Bakhtiari, Dhairya Malhotra, Amir Raoofy, Miriam Mehl, Hans-Joachim Bungartz, George Biros. 44 [doi]
- MUSA: a multi-level simulation approach for next-generation HPC machinesThomas Grass, César Allande, Adrià Armejach, Alejandro Rico, Eduard Ayguadé, Jesús Labarta, Mateo Valero, Marc Casas, Miquel Moretó. 45 [doi]
- A machine learning framework for performance coverage analysis of proxy applicationsTanzima Zerin Islam, Jayaraman J. Thiagarajan, Abhinav Bhatele, Martin Schulz, Todd Gamblin. 46 [doi]
- Caliper: performance introspection for HPC software stacksDavid Böhme, Todd Gamblin, David Beckingsale, Peer-Timo Bremer, Alfredo Giménez, Matthew P. LeGendre, Olga Pearce, Martin Schulz. 47 [doi]
- Exploring the potentials of parallel garbage collection in SSDs for enterprise storage systemsNarges Shahidi, Mohammad Arjomand, Myoungsoo Jung, Mahmut T. Kandemir, Chita R. Das, Anand Sivasubramaniam. 48 [doi]
- Týr: blob storage meets built-in transactionsPierre Matri, Alexandru Costan, Gabriel Antoniu, Jesús Montes, María S. Pérez. 49 [doi]
- DAOS and friends: a proposal for an exascale storage systemJay F. Lofstead, Ivo Jimenez, Carlos Maltzahn, Quincey Koziol, John Bent, Eric Barton. 50 [doi]
- Translating OpenMP device constructs to OpenCL using unnecessary data transfer eliminationJunghyun Kim, Yong-jun Lee, Jung-Ho Park, Jaejin Lee. 51 [doi]
- dCUDA: hardware supported overlap of computation and communicationTobias Gysi, Jeremia Bär, Torsten Hoefler. 52 [doi]
- Daino: a high-level framework for parallel and efficient AMR on GPUsMohamed Wahib, Naoya Maruyama, Takayuki Aoki. 53 [doi]
- Optimizing memory efficiency for deep convolutional neural networks on GPUsChao Li, Yi Yang, Min Feng, Srimat T. Chakradhar, Huiyang Zhou. 54 [doi]
- Unprotected computing: a large-scale study of DRAM raw error rate on a supercomputerLeonardo Bautista-Gomez, Ferad Zyulkyarov, Osman S. Unsal, Simon McIntosh-Smith. 55 [doi]
- A data driven scheduling approach for power management on HPC systemsSean Wallace, Xu Yang, Venkatram Vishwanath, William E. Allcock, Susan Coghlan, Michael E. Papka, Zhiling Lan. 56 [doi]
- GreenLA: green linear algebra software for GPU-accelerated heterogeneous computingJieyang Chen, Li Tan, Panruo Wu, Dingwen Tao, Hongbo Li, Xin Liang, Sihuan Li, Rong Ge, Laxmi N. Bhuyan, Zizhong Chen. 57 [doi]
- Merge-based parallel sparse matrix-vector multiplicationDuane Merrill, Michael Garland. 58 [doi]
- Strassen's algorithm reloadedJianyu Huang, Tyler M. Smith, Greg M. Henry, Robert A. van de Geijn. 59 [doi]
- Optimal execution of co-analysis for large-scale molecular dynamics simulationsPreeti Malakar, Venkatram Vishwanath, Christopher Knight, Todd S. Munson, Michael E. Papka. 60 [doi]
- Scalemine: scalable parallel frequent subgraph mining in a single large graphEhab Abdelhamid, Ibrahim Abdelaziz, Panos Kalnis, Zuhair Khayyat, Fuad Jamour. 61 [doi]
- Efficient delaunay tessellation through K-D tree decompositionDmitriy Morozov, Tom Peterka. 62 [doi]
- A PCIe congestion-aware performance model for densely populated accelerator serversMaxime Martinasso, Grzegorz Kwasniewski, Sadaf R. Alam, Thomas C. Schulthess, Torsten Hoefler. 63 [doi]
- Watch out for the bully!: job interference study on dragonfly networkXu Yang, John Jenkins, Misbah Mubarak, Robert B. Ross, Zhiling Lan. 64 [doi]
- Measuring and understanding throughput of network topologiesSangeetha Abdu Jyothi, Ankit Singla, Brighten Godfrey, Alexandra Kolla. 65 [doi]
- b-Matching algorithms on distributed memory multiprocessors by approximationArif M. Khan, Alex Pothen, Md. Mostofa Ali Patwary, Mahantesh Halappanavar, Nadathur Rajagopalan Satish, Narayanan Sundaram, Pradeep Dubey. 66 [doi]
- k-mismatch maximal common substringsSriram P. Chockalingam, Sharma V. Thankachan, Srinivas Aluru. 67 [doi]
- Accelerating lattice QCD multigrid on GPUs using fine-grained parallelizationMichael A. Clark, Bálint Joó, Alexei Strelchenko, Michael Cheng, Arjun Gambhir, Richard C. Brower. 68 [doi]
- An ephemeral burst-buffer file system for scientific applicationsTeng Wang, Kathryn Mohror, Adam Moody, Kento Sato, Weikuan Yu. 69 [doi]
- Server-side log data analytics for I/O workload characterization and coordination on large shared storage systemsYang Liu, Raghul Gunasekaran, Xiaosong Ma, Sudharshan S. Vazhkudai. 70 [doi]
- G-store: high-performance graph store for trillion-edge processingPradeep Kumar, H. Howie Huang. 71 [doi]
- Distributed-memory large deformation diffeomorphic 3D image registrationAndreas Mang, Amir Gholami, George Biros. 72 [doi]
- i: maximizing the inference throughput of 3D convolutional networks on CPUs and GPUsAleksandar Zlateski, Kisuk Lee, H. Sebastian Seung. 73 [doi]
- High performance emulation of quantum circuitsThomas Häner, Damian S. Steiger, Mikhail Smelyanskiy, Matthias Troyer. 74 [doi]
- Elastic multi-resource fairness: balancing fairness and efficiency in coupled CPU-GPU architecturesShanjiang Tang, Bingsheng He, Shuhao Zhang, Zhaojie Niu. 75 [doi]
- DCA: a DRAM-cache-aware DRAM controllerCheng-Chieh Huang, Vijay Nagarajan, Arpit Joshi. 76 [doi]
- Enabling efficient preemption for SIMT architectures with lightweight context switchingZhen Lin, Lars Nyland, Huiyang Zhou. 77 [doi]
- Characterizing parallel scientific applications on commodity clusters: an empirical study of a tapered fat-treeEdgar A. León, Ian Karlin, Abhinav Bhatele, Steven H. Langer, Chris Chambreau, Louis H. Howell, Trent D'Hooge, Matthew L. Leininger. 78 [doi]
- in situ infrastructuresUtkarsh Ayachit, Andrew C. Bauer, Earl P. N. Duque, Greg Eisenhauer, Nicola Ferrier, Junmin Gu, Kenneth E. Jansen, Burlen Loring, Zarija Lukic, Suresh Menon, Dmitriy Morozov, Patrick O'Leary, Reetesh Ranjan, Michel E. Rasquin, Christopher P. Stone, Venkatram Vishwanath, Gunther H. Weber, Brad Whitlock, Matthew Wolf, John K. Wu, E. Wes Bethel. 79 [doi]
- Extended task queuing: active messages for heterogeneous systemsMichael LeBeane, Brandon Potter, Abhisek Pan, Alexandru Dutu, Vinay Agarwala, Wonchan Lee, Deepak Majeti, Bibek Ghimire, Eric Van Tassell, Samuel Wasmundt, Brad Benton, Mauricio Breternitz, Michael L. Chu, Mithuna Thottethodi, Lizy K. John, Steven K. Reinhardt. 80 [doi]
- Perilla: metadata-based optimizations of an asynchronous runtime for adaptive mesh refinementTan Nguyen, Didem Unat, Weiqun Zhang, Ann S. Almgren, Muhammed Nufail Farooqi, John Shalf. 81 [doi]
- High-frequency nonlinear earthquake simulations on petascale heterogeneous supercomputersDaniel Roten, Yifeng Cui, Kim B. Olsen, Steven M. Day, Kyle Withers, William H. Savran, Peng Wang, Dawei Mu. 82 [doi]
- Refactoring and optimizing the community atmosphere model (CAM) on the sunway taihulight supercomputerHaohuan Fu, Junfeng Liao, Wei Xue, Lanning Wang, Dexun Chen, Long Gu, Jinxiu Xu, Nan Ding, Xinliang Wang, Conghui He, Shizhen Xu, Yishuang Liang, Jiarui Fang, Yuanchao Xu, Weijie Zheng, Jingheng Xu, Zhen Zheng, Wanjing Wei, Xu Ji, He Zhang, Bingwei Chen, Kaiwei Li, Xiaomeng Huang, Wenguang Chen, Guangwen Yang. 83 [doi]
- LIBXSMM: accelerating small matrix multiplications by runtime code generationAlexander Heinecke, Greg Henry, Maxwell Hutchinson, Hans Pabst. 84 [doi]
- Transient guarantees: maximizing the value of idle cloud capacitySupreeth Shastri, Amr Rizk, David E. Irwin. 85 [doi]
- Multi-resource fair sharing for datacenter jobs with placement constraintsWei Wang, Baochun Li, Ben Liang, Jun Li. 86 [doi]
- A multi-faceted approach to job placement for improved performance on extreme-scale systemsChristopher Zimmer, Saurabh Gupta, Scott Atchley, Sudharshan S. Vazhkudai, Carl Albing. 87 [doi]