Abstract is missing.
- k-mer countingTony C. Pan, Sanchit Misra, Srinivas Aluru. [doi]
- FlipTracker: understanding natural error resilience in HPC applicationsLuanzheng Guo, Dong Li, Ignacio Laguna, Martin Schulz 0001. [doi]
- Evaluating and accelerating high-fidelity error injection for HPCChun-Kai Chang, Sangkug Lym, Nicholas Kelly, Michael B. Sullivan, Mattan Erez. [doi]
- Exploiting idle resources in a high-radix switch for supplemental storageMatthias A. Blumrich, Nan Jiang, Larry R. Dennison. [doi]
- Phase asynchronous AMR execution for productive and performant astrophysical flowsMuhammed Nufail Farooqi, Tan Nguyen, Weiqun Zhang, Ann S. Almgren, John Shalf, Didem Unat. [doi]
- Associative instruction reordering to alleviate register pressurePrashant Singh Rawat, Aravind Sukumaran-Rajam, Atanas Rountev, Fabrice Rastello, Louis-Noël Pouchet, P. Sadayappan. [doi]
- Mitigating inter-job interference using adaptive flow-aware routingStaci A. Smith, Clara E. Cromey, David K. Lowenthal, Jens Domke, Nikhil Jain, Jayaraman J. Thiagarajan, Abhinav Bhatele. [doi]
- Framework for scalable intra-node collective operations using shared memorySurabhi Jain, Rashid Kaleem, Marc Gamell Balmana, Akhil Langer, Dmitry Durnov, Alexander Sannikov, Maria Garzaran. [doi]
- Partial redundancy in HPC systems with non-uniform node reliabilitiesZaeem Hussain, Taieb Znati, Rami G. Melhem. [doi]
- Runtime-assisted cache coherence deactivation in task parallel programsPaul Caheny, Lluc Alvarez, Mateo Valero, Miquel Moretó, Marc Casas. [doi]
- Siena: exploring the design space of heterogeneous memory systemsIvy Bo Peng, Jeffrey S. Vetter. [doi]
- Optimizing software-directed instruction replication for GPU error detectionAbdulrahman Mahmoud, Siva Kumar Sastry Hari, Michael B. Sullivan, Timothy Tsai, Stephen W. Keckler. [doi]
- Extreme scale de novo metagenome assemblyEvangelos Georganas, Rob Egan, Steven A. Hofmeyr, Eugene Goltsman, Bill Arndt, Andrew Tritt, Aydin Buluç, Leonid Oliker, Katherine A. Yelick. [doi]
- ParSy: inspection and transformation of sparse matrix computations for parallelismKazem Cheshmi, Shoaib Kamil, Michelle Mills Strout, Maryam Mehri Dehnavi. [doi]
- PruneJuice: pruning trillion-edge graphs to a precise pattern-matching solutionTahsin Reza, Matei Ripeanu, Nicolas Tripoul, Geoffrey Sanders, Roger Pearce. [doi]
- GPU age-aware scheduling to improve the reliability of leadership jobs on TitanChristopher Zimmer, Don Maxwell, Stephen McNally, Scott Atchley, Sudharshan S. Vazhkudai. [doi]
- The design, deployment, and evaluation of the CORAL pre-exascale systemsSudharshan S. Vazhkudai, Bronis R. de Supinski, Arthur S. Bland, Al Geist, James C. Sexton, Jim Kahle, Christopher J. Zimmer, Scott Atchley, Sarp Oral, Don E. Maxwell, Verónica G. Vergara Larrea, Adam Bertsch, Robin Goldstone, Wayne Joubert, Chris Chambreau, David Appelhans, Robert Blackmore, Ben Casses, George Chochia, Gene Davison, Matthew A. Ezell, Tom Gooding, Elsa Gonsiorowski, Leopold Grinberg, Bill Hanson, Bill Hartner, Ian Karlin, Matthew L. Leininger, Dustin Leverman, Chris Marroquin, Adam Moody, Martin Ohmacht, Ramesh Pankajakshan, Fernando Pizzano, James H. Rogers, Bryan S. Rosenburg, Drew Schmidt, Mallikarjun Shankar, Feiyi Wang, Py Watson, Bob Walkup, Lance D. Weems, Junqi Yin. [doi]
- Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solversAzzam Haidar, Stanimire Tomov, Jack J. Dongarra, Nicholas J. Higham. [doi]
- Dynamic tracing: memoization of task graphs for dynamic task-based runtimesWonchan Lee, Elliott Slaughter, Michael Bauer, Sean Treichler, Todd Warszawski, Michael Garland, Alex Aiken. [doi]
- Redesigning LAMMPS for peta-scale and hundred-billion-atom simulation on Sunway TaihuLightXiaohui Duan, Ping Gao, Tingjian Zhang, Meng Zhang, Weiguo Liu, Wusheng Zhang, Wei Xue, Haohuan Fu, Lin Gan, Dexun Chen, Xiangxu Meng, Guangwen Yang. [doi]
- Evaluation of an interference-free node allocation policy on fat-tree clustersSamuel D. Pollard, Nikhil Jain, Stephen Herbein, Abhinav Bhatele. [doi]
- Characterization of MPI usage on a production supercomputerSudheer Chunduri, Scott Parker, Pavan Balaji, Kevin Harms, Kalyan Kumaran. [doi]
- Doomsday: predicting which node will fail when on supercomputersAnwesha Das, Frank Mueller, Paul Hargrove, Eric Roman, Scott B. Baden. [doi]
- Distributed memory sparse inverse covariance matrix estimation on high-performance computing architecturesAryan Eftekhari, Matthias Bollhöfer, Olaf Schenk. [doi]
- DRAGON: breaking GPU memory capacity limits with direct NVM accessPak Markthub, Mehmet E. Belviranli, Seyong Lee, Jeffrey S. Vetter, Satoshi Matsuoka. [doi]
- Computing planetary interior normal modes with a highly parallel polynomial filtering eigensolverJia Shi, Ruipeng Li, Yuanzhe Xi, Yousef Saad, Maarten V. De Hoop. [doi]
- weak death of the Neutron in a femtoscale universe with near-exascale computingEvan Berkowitz, Michael A. Clark, Arjun Singh Gambhir, Kenneth McElvain, Amy Nicholson, Enrico Rinaldi, Pavlos Vranas, André Walker-Loud, Chia-Cheng Chang, Bálint Joó, Thorsten Kurth, Kostas Orginos. [doi]
- Simulating the Wenchuan earthquake with accurate surface topography on Sunway TaihuLightBingwei Chen, Haohuan Fu, Yanwen Wei, Conghui He, Wenqiang Zhang, Yuxuan Li, Wubin Wan, Wei Zhang, Lin Gan, Wei Zhang, Zhenguo Zhang, Guangwen Yang, Xiaofei Chen. [doi]
- Accelerating quantum chemistry with vectorized and batched integralsHua Huang, Edmond Chow. [doi]
- A fast scalable implicit solver for nonlinear time-evolution earthquake city problem on low-ordered unstructured finite elements with artificial intelligence and transprecision computingTsuyoshi Ichimura, Kohei Fujita, Takuma Yamaguchi, Akira Naruse, Jack C. Wells, Thomas C. Schulthess, Tjerk P. Straatsma, Christopher J. Zimmer, Maxime Martinasso, Kengo Nakajima, Muneo Hori, Lalith Maddegedara. [doi]
- CosmoFlow: using deep learning to learn the universe at scaleAmrita Mathuriya, Deborah Bard, Peter Mendygral, Lawrence Meadows, James Arnemann, Lei Shao, Siyu He, Tuomas Kärnä, Diana Moise, Simon J. Pennycook, Kristyn J. Maschhoff, Jason Sewall, Nalini Kumar, Shirley Ho, Michael F. Ringenburg, Prabhat, Victor W. Lee. [doi]
- Runtime data management on non-volatile memory-based heterogeneous memory for task-parallel programsKai Wu, Jie Ren, Dong Li. [doi]
- High-performance dense tucker decomposition on GPU clustersJee W. Choi, Xing Liu, Venkatesan T. Chakaravarthy. [doi]
- Cooperative rendezvous protocols for improved performance and overlapS. Chakraborty, Mohammadreza Bayatpour, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda. [doi]
- Light-weight protocols for wire-speed orderingHans Eberle, Larry Dennison. [doi]
- Fine-grained, multi-domain network resource abstraction as a fundamental primitive to enable high-performance, collaborative data sciencesQiao Xiang, J. Jensen Zhang, X. Tony Wang, Y. Jace Liu, Chin Guok, Franck Le, John Macauley, Harvey Newman, Y. Richard Yang. [doi]
- Dynamic data race detection for OpenMP programsYizi Gu, John M. Mellor-Crummey. [doi]
- A year in the life of a parallel file systemGlenn K. Lockwood, Shane Snyder, Teng Wang, Suren Byna, Philip H. Carns, Nicholas J. Wright. [doi]
- Distributed-memory hierarchical compression of dense SPD matricesChenhan D. Yu, Severin Reiz, George Biros. [doi]
- 167-PFlops deep learning for electron microscopy: from learning physics to atomic manipulationRobert M. Patton, J. Travis Johnston, Steven R. Young, Catherine D. Schuman, Don D. March, Thomas E. Potok, Derek C. Rose, Seung-Hwan Lim, Thomas P. Karnowski, Maxim A. Ziatdinov, Sergei V. Kalinin. [doi]
- HiCOO: hierarchical storage of sparse tensorsJiajia Li, Jimeng Sun, Richard W. Vuduc. [doi]
- k-means for heterogeneous many-core supercomputersLiandeng Li, Teng Yu, Wenlai Zhao, Haohuan Fu, Chenyu Wang, Li Tan, Guangwen Yang, John Thomson. [doi]
- Dynamically negotiating capacity between on-demand and batch clustersFeng Liu, Kate Keahey, Pierre Riteau, Jon B. Weissman. [doi]
- Best practices and lessons from deploying and operating a sustained-petascale system: the blue waters experienceGregory H. Bauer, Brett Bode, Jeremy Enos, William T. Kramer, Scott Lathrop, Celso L. Mendes, Robert Sisneros. [doi]
- bespoKV: application tailored scale-out key-value storesAli Anwar, Yue Cheng, Hai Huang, Jingoo Han, Hyogi Sim, Dongyoon Lee, Fred Douglis, Ali Raza Butt. [doi]
- Attacking the opioid epidemic: determining the epistatic and pleiotropic genetic architectures for chronic pain and opioid addictionWayne Joubert, Deborah A. Weighill, David Kainer, Sharlee Climer, Amy Justice, Kjiersten Fagnan, Daniel Jacobson. [doi]
- faimGraph: high performance management of fully-dynamic graphs under tight memory constraints on the GPUMartin Winter, Daniel Mlakar, Rhaleb Zayer, Hans-Peter Seidel, Markus Steinberger. [doi]
- Exploring flexible communications for streamlining DNN ensemble training pipelinesRandall Pittman, Hui Guan, Xipeng Shen, Seung-Hwan Lim, Robert M. Patton. [doi]
- RM-replay: a high-fidelity tuning, optimization and exploration tool for resource managementMaxime Martinasso, Miguel Gila, Mauro Bianco, Sadaf R. Alam, Colin McMurtrie, Thomas C. Schulthess. [doi]
- A lightweight model for right-sizing master-worker applicationsNathaniel Kremer-Herman, Benjamín Tovar, Douglas Thain. [doi]
- Anatomy of high-performance deep learning convolutions on SIMD architecturesEvangelos Georganas, Sasikanth Avancha, Kunal Banerjee, Dhiraj D. Kalamkar, Greg Henry, Hans Pabst, Alexander Heinecke. [doi]
- SP-cache: load-balanced, redundancy-free cluster caching with selective partitionYinghao Yu, Renfei Huang, Wei Wang, Jun Zhang, Khaled Ben Letaief. [doi]
- Topology-aware space-shared co-analysis of large-scale molecular dynamics simulationsPreeti Malakar, Todd Munson, Christopher Knight, Venkatram Vishwanath, Michael E. Papka. [doi]
- PRISM: predicting resilience of GPU applications using statistical methodsCham Kalra, Fritz Previlon, Xiangyu Li, Norman Rubin, David R. Kaeli. [doi]
- Lessons learned from memory errors observed over the lifetime of CieloScott Levy, Kurt B. Ferreira, Nathan DeBardeleben, Taniya Siddiqua, Vilas Sridharan, Elisabeth Baseman. [doi]
- A reference architecture for datacenter scheduling: design, validation, and experimentsGeorgios Andreadis, Laurens Versluis, Fabian Mastenbroek, Alexandru Iosup. [doi]
- iSpan: parallel identification of strongly connected components with spanning treesYuede Ji, Hang Liu, H. Howie Huang. [doi]
- A divide and conquer algorithm for DAG scheduling under power constraintsGökalp Demirci, Ivana Marincic, Henry Hoffmann. [doi]
- Fault tolerant one-sided matrix decompositions on heterogeneous systems with GPUsJieyang Chen, Hongbo Li, Sihuan Li, Xin Liang, Panruo Wu, Dingwen Tao, Kaiming Ouyang, Yuanlai Liu, Kai Zhao, Qiang Guan, Zizhong Chen. [doi]
- Many-core graph workload analysisStijn Eyerman, Wim Heirman, Kristof Du Bois, Joshua B. Fryman, Ibrahim Hur. [doi]
- TriCore: parallel triangle counting on GPUsYang Hu, Hang Liu, H. Howie Huang. [doi]
- ADAPT: algorithmic differentiation applied to floating-point precision tuningHarshitha Menon, Michael O. Lam, Daniel Osei-Kuffuor, Markus Schordan, Scott Lloyd, Kathryn Mohror, Jeffrey Hittinger. [doi]
- Dac-Man: data change management for scientific datasets on HPC systemsDevarshi Ghoshal, Lavanya Ramakrishnan, Deborah A. Agarwal. [doi]
- Performance evaluation of a vector supercomputer SX-aurora TSUBASAKazuhiko Komatsu, Shintaro Momose, Yoko Isobe, Osamu Watanabe, Akihiro Musa, Mitsuo Yokokawa, Toshikazu Aoyama, Masayuki Sato 0001, Hiroaki Kobayashi. [doi]
- Scaling embedded in-situ indexing with deltaFSQing Zheng, Charles D. Cranor, Danhao Guo, Gregory R. Ganger, George Amvrosiadis, Garth A. Gibson, Bradley W. Settlemyer, Gary Grider, Fan Guo. [doi]
- Detecting MPI usage anomalies via partial program symbolic executionFangke Ye, Jisheng Zhao, Vivek Sarkar. [doi]
- Energy efficiency modeling of parallel applicationsMark Endrei, Chao Jin, Minh Ngoc Dinh, David Abramson, Heidi Poxon, Luiz DeRose, Bronis R. de Supinski. [doi]
- Exascale deep learning for climate analyticsThorsten Kurth, Sean Treichler, Joshua Romero, Mayur Mudigonda, Nathan Luehr, Everett H. Phillips, Ankur Mahesh, Michael Matheson, Jack Deslippe, Massimiliano Fatica, Prabhat, Michael Houston. [doi]
- Adaptive anonymization of data using b-edge coverArif Khan, Krzysztof Choromanski, Alex Pothen, S. M. Ferdous, Mahantesh Halappanavar, Antonino Tumeo. [doi]
- Stacker: an autonomic data movement engine for extreme-scale data staging-based in-situ workflowsPradeep Subedi, Philip E. Davis, Shaohua Duan, Scott Klasky, Hemanth Kolla, Manish Parashar. [doi]
- ShenTu: processing multi-trillion edge graphs on millions of cores in secondsHeng Lin, Xiaowei Zhu, Bowen Yu, Xiongchao Tang, Wei Xue, Wenguang Chen, Lufei Zhang, Torsten Hoefler, Xiaosong Ma, Xin Liu, Weimin Zheng, Jingfang Xu. [doi]
- A parallelism profiler with what-if analyses for OpenMP programsNader Boushehrinejadmoradi, Adarsh Yoga, Santosh Nagarakatte. [doi]
- HPL and DGEMM performance variability on the Xeon Platinum 8160 processorJohn D. McCalpin. [doi]
- Lessons learned from analyzing dynamic promotion for user-level threadingShintaro Iwasaki, Abdelhalim Amer, Kenjiro Taura, Pavan Balaji. [doi]