Abstract is missing.
- A 1024-member ensemble data assimilation with 3.5-km mesh global weather simulationsHisashi Yashiro, Koji Terasaki, Yuta Kawai, Shuhei Kudo, Takemasa Miyoshi, Toshiyuki Imamura, Kazuo Minami, Hikaru Inoue, Tatsuo Nishiki, Takayuki Saji, Masaki Satoh, Hirofumi Tomita. 1 [doi]
- Processing full-scale square kilometre array data on the summit supercomputerRuonan Wang, Rodrigo Tobar, Markus Dolensky, Tao An, Andreas Wicenec, Chen Wu, Fred Dulwich, Norbert Podhorszki, Valentine Anantharaj, Eric Suchyta, Bao-qiang Lao, Scott Klasky. 2 [doi]
- Toward realization of numerical towing-tank tests by wall-resolved large eddy simulation based on 32 billion grid finite-element computationChisachi Kato, Yoshinobu Yamade, Katsuhiro Nagano, Kiyoshi Kumahata, Kazuo Minami, Tatsuo Nishikawa. 3 [doi]
- Accelerating large-scale excited-state GW calculations on leadership HPC systemsMauro Del Ben, Charlene Yang, Zhenglu Li, Felipe H. da Jornada, Steven G. Louie, Jack Deslippe. 4 [doi]
- ab initio accuracy to 100 million atoms with machine learningWeile Jia, Han Wang 0006, MoHan Chen, Denghui Lu, Lin Lin 0001, Roberto Car, Weinan E, Linfeng Zhang. 5 [doi]
- Scalable knowledge graph analytics at 136 petaflop/sRamakrishnan Kannan, Piyush Sao, Hao Lu, Drahomira Herrmannova, Vijay Thakkar, Robert M. Patton, Richard W. Vuduc, Thomas E. Potok. 6 [doi]
- A parallel framework for constraint-based bayesian network learning via markov blanket discoveryAnkit Srivastava, Sriram P. Chockalingam, Srinivas Aluru. 7 [doi]
- Recurrent neural network architecture search for geophysical emulationRomit Maulik, Romain Egele, Bethany Lusch, Prasanna Balaprakash. 8 [doi]
- MeshfreeFlowNet: a physics-constrained deep continuous space-time super-resolution frameworkChiyu Max Jiang, Soheil Esmaeilzadeh, Kamyar Azizzadenesheli, Karthik Kashinath, Mustafa Mustafa, Hamdi A. Tchelepi, Philip Marcus, Prabhat, Anima Anandkumar. 9 [doi]
- Improving all-to-many personalized communication in two-phase I/OQiao Kang, Robert B. Ross, Robert Latham, SunWoo Lee, Ankit Agrawal, Alok N. Choudhary, Wei-keng Liao. 10 [doi]
- Taming I/O variation on QoS-less HPC storage: what can applications do?Zhenbo Qiao, Qing Liu 0002, Norbert Podhorszki, Scott Klasky, Jieyang Chen. 11 [doi]
- BORA: a bag optimizer for robotic analysisJian Zhang, Tao Xie, Yuzhuo Jing, Yanjie Song, Guanzhou Hu, Si Chen, Shu Yin. 12 [doi]
- Density matrix quantum circuit simulation via the BSP machine on modern GPU clustersAng Li, Omer Subasi, Xiu Yang, Sriram Krishnamoorthy. 13 [doi]
- Efficient 2D tensor network simulation of quantum systemsYuchen Pang, Tianyi Hao 0003, Annika Dugad, Yiqing Zhou, Edgar Solomonik. 14 [doi]
- Veritas: accurately estimating the correct output on noisy intermediate-scale quantum computersTirthak Patel, Devesh Tiwari. 15 [doi]
- Accelerating sparse DNN models without hardware-support via tile-wise sparsityCong Guo 0003, Bo Yang Hsueh, Jingwen Leng, Yuxian Qiu, Yue Guan, Zehuan Wang, Xiaoying Jia 0001, Xipeng Li, Minyi Guo, Yuhao Zhu 0001. 16 [doi]
- Sparse GPU kernels for deep learningTrevor Gale, Matei Zaharia, Cliff Young, Erich Elsen. 17 [doi]
- SpTFS: sparse tensor format selection for MTTKRP via deep learningQingxiao Sun, Yi Liu, Ming Dun, Hailong Yang, Zhongzhi Luan, Lin Gan, Guangwen Yang, Depei Qian. 18 [doi]
- Scaling distributed deep learning workloads beyond the memory capacity with KARMAMohamed Wahib, Haoyu Zhang, Truong Thao Nguyen, Aleksandr Drozd, Jens Domke, Lingqi Zhang 0001, Ryousei Takano, Satoshi Matsuoka. 19 [doi]
- ZeRO: memory optimizations toward training trillion parameter modelsSamyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He. 20 [doi]
- Kraken: memory-efficient continual learning for large-scale real-time recommendationsMinhui Xie, Kai Ren, Youyou Lu, Guangxu Yang, Qingxing Xu, Bihai Wu, Jiazhen Lin, Hongbo Ao, Wanhong Xu, Jiwu Shu. 21 [doi]
- Cell-list based molecular dynamics on many-core processors: a case study on sunway TaihuLight supercomputerXiaohui Duan, Ping Gao 0005, Meng Zhang, Tingjian Zhang, Hongsong Meng, Yuxuan Li, Bertil Schmidt, Haohuan Fu, Lin Gan, Wei Xue, Weiguo Liu, Guangwen Yang. 22 [doi]
- Evaluation of a minimally synchronous algorithm for 2: 1 octree balanceHansol Suh, Tobin Isaac. 23 [doi]
- Distributed-memory DMRG via sparse and dense parallel tensor contractionsRyan Levy, Edgar Solomonik, Bryan K. Clark. 24 [doi]
- TAGO: rethinking routing design in high performance reconfigurable networksMin Yee Teh, Yu-Han Hung, George Michelogiannakis, Shijia Yan, Madeleine Glick, John Shalf, Keren Bergman. 25 [doi]
- Architecture and performance studies of 3D-Hyper-FleX-LION for reconfigurable all-to-all HPC networksGengchen Liu, Roberto Proietti, Marjan Fariborz, Pouya Fotouhi, Xian Xiao, S. J. Ben Yoo. 26 [doi]
- FatPaths: routing in supercomputers and data centers when shortest paths fall shortMaciej Besta, Marcel Schneider, Marek Konieczny, Karolina Cynk, Erik Henriksson, Salvatore Di Girolamo, Ankit Singla, Torsten Hoefler. 27 [doi]
- ScalAna: automating scaling loss detection with graph analysisYuyang Jin, Haojie Wang, Teng Yu, Xiongchao Tang, Torsten Hoefler, Xu Liu 0001, Jidong Zhai. 28 [doi]
- ZeroSpy: exploring software inefficiency with redundant zerosXin You, Hailong Yang, Zhongzhi Luan, Depei Qian, Xu Liu 0001. 29 [doi]
- DrCCTProf: a fine-grained call path profiler for ARM-based clustersQidong Zhao, Xu Liu 0001, Milind Chabbi. 30 [doi]
- RLScheduler: an automated HPC batch job scheduler using reinforcement learningDi Zhang, Dong Dai, Youbiao He, Forrest Sheng Bao, Bing Xie. 31 [doi]
- Alita: comprehensive performance isolation through bias resource management for public cloudsQuan Chen, Shuai Xue, Shang Zhao, Shanpei Chen, Yihao Wu, Yu Xu, Zhuo Song, Tao Ma, Yong Yang, Minyi Guo. 32 [doi]
- HPC I/O throughput bottleneck analysis with explainable local modelsMihailo Isakov, Eliakin Del Rosario, Sandeep Madireddy, Prasanna Balaprakash, Philip H. Carns, Robert B. Ross, Michel A. Kinsy. 33 [doi]
- A hierarchical and load-aware design for large message neighborhood collectivesS. Mahdieh Ghazimirsaeed, Qinghua Zhou, Amit Ruhela, Mohammadreza Bayatpour. 34 [doi]
- An in-depth analysis of the slingshot interconnectDaniele De Sensi, Salvatore Di Girolamo, Kim H. McMahon, Duncan Roweth, Torsten Hoefler. 35 [doi]
- CAB-MPI: exploring interprocess work-stealing towards balanced MPI communicationKaiming Ouyang, Min-Si, Atsushi Hori, Zizhong Chen, Pavan Balaji. 36 [doi]
- Petascale XCT: 3D image reconstruction with hierarchical communications on multi-GPU nodesMert Hidayetoglu, Tekin Bicer, Simon Garcia De Gonzalo, Bin Ren, Vincent De Andrade, Doga Gürsoy, Raj Kettimuthu, Ian T. Foster, Wen-mei W. Hwu. 37 [doi]
- Multi-node multi-GPU diffeomorphic image registration for large-scale imaging problemsMalte Brunn, Naveen Himthani, George Biros, Miriam Mehl, Andreas Mang. 38 [doi]
- SegAlign: a scalable GPU-based whole genome alignerSneha D. Goenka, Yatish Turakhia, Benedict Paten, Mark Horowitz. 39 [doi]
- TOSS-2020: a commodity software stack for HPCEdgar A. León, Trent D'Hooge, Nathan Hanford, Ian Karlin, Ramesh Pankajakshan, Jim Foraker, Chris Chambreau, Matthew L. Leininger. 40 [doi]
- GPU lifetimes on titan supercomputer: survival analysis and reliabilityGeorge Ostrouchov, Don Maxwell, Rizwan A. Ashraf, Christian Engelmann, Mallikarjun Shankar, James H. Rogers. 41 [doi]
- Iris: allocation banking and identity and access management for the exascale eraGabor Torok, Mark R. Day, Rebecca J. Hartman-Baker, Cory Snavely. 42 [doi]
- Optimizing deep learning recommender systems training on CPU cluster architecturesDhiraj D. Kalamkar, Evangelos Georganas, Sudarshan Srinivasan, Jianping Chen, Mikhail Shiryaev, Alexander Heinecke. 43 [doi]
- Herring: rethinking the parameter server at scale for the cloudIndu Thangakrishnan, Derya Cavdar, Can Karakus, Piyush Ghai, Yauheni Selivonchyk, Cory Pruce. 44 [doi]
- GEMS: GPU-enabled memory-aware model-parallelism system for distributed DNN trainingArpan Jain, Ammar Ahmad Awan, Asmaa M. Aljuhani, Jahanzeb Maqbool Hashmi, Quentin G. Anthony, Hari Subramoni, Dhabaleswar K. Panda, Raghu Machiraju, Anil Parwani. 45 [doi]
- Experimental evaluation of NISQ quantum computers: error measurement, characterization, and implicationsTirthak Patel, Abhay Potharaju, Baolin Li, Rohan Basu Roy, Devesh Tiwari. 46 [doi]
- Co-design for A64FX manycore processor and "Fugaku"Mitsuhisa Sato, Yutaka Ishikawa, Hirofumi Tomita, Yuetsu Kodama, Tetsuya Odajima, Miwako Tsuji, Hisashi Yashiro, Masaki Aoki, Naoyuki Shida, Ikuo Miyoshi, Kouichi Hirai, Atsushi Furuya, Akira Asato, Kuniki Morita, Toshiyuki Shimizu. 47 [doi]
- Chronicles of astra: challenges and lessons from the first petascale arm supercomputerKevin T. Pedretti, Andrew J. Younge, Simon D. Hammond, James H. Laros III, Matthew L. Curry, Michael J. Aguilar, Robert J. Hoekstra, Ron Brightwell. 48 [doi]
- pLiner: isolating lines of floating-point code for compiler-induced variabilityHui Guo 0007, Ignacio Laguna, Cindy Rubio-González. 49 [doi]
- Tuning floating-point precision using dynamic program information and temporal localityHugo Brunie, Costin Iancu, Khaled Z. Ibrahim, Philip Brisk, Brandon Cook 0001. 50 [doi]
- Scalable yet rigorous floating-point error analysisArnab Das, Ian Briggs, Ganesh Gopalakrishnan, Sriram Krishnamoorthy, Pavel Panchekha. 51 [doi]
- RDMP-KV: designing remote direct memory persistence based key-value stores with PMEMTianxi Li, Dipti Shankar, Shashank Gugnani, Xiaoyi Lu. 52 [doi]
- Compiler-based timing for extremely fine-grain preemptive parallelismSouradip Ghosh, Michael Cuevas, Simone Campanoni, Peter A. Dinda. 53 [doi]
- OMPRacer: a scalable and precise static race detector for OpenMP programsBradley Swain, Yanze Li, Peiming Liu, Ignacio Laguna, Giorgis Georgakoudis, Jeff Huang 0001. 54 [doi]
- Preempt: scalable epidemic interventions using submodular optimization on multi-GPU systemsMarco Minutoli, Prathyush Sambaturu, Mahantesh Halappanavar, Antonino Tumeo, Ananth Kalyanaraman, Anil Vullikanti. 55 [doi]
- C-SAW: a framework for graph sampling and random walk on GPUsSantosh Pandey, Lingda Li, Adolfy Hoisie, Xiaoye S. Li, Hang Liu. 56 [doi]
- Newton-ADMM: a distributed GPU-accelerated optimizer for multiclass classification problemsChih-Hao Fang, Sudhir B. Kylasa, Fred Roosta, Michael W. Mahoney, Ananth Grama. 57 [doi]
- Fast stencil-code computation on a wafer-scale processorKamil Rocki, Dirk Van Essendelft, Ilya Sharapov, Robert Schreiber, Michael Morrison, Vladimir Kibardin, Andrey Portnoy, Jean-Francois Dietiker, Madhava Syamlal, Michael James 0002. 58 [doi]
- fBLAS: streaming linear algebra on FPGATiziano De Matteis, Johannes de Fine Licht, Torsten Hoefler. 59 [doi]
- Massive parallelization for finding shortest lattice vectors based on ubiquity generator frameworkNariaki Tateiwa, Yuji Shinano, Satoshi Nakamura, Akihiro Yoshida, Shizuo Kaji, Masaya Yasuda, Katsuki Fujisawa. 60 [doi]
- Cost-aware prediction of uncorrected DRAM errors in the fieldIsaac Boixaderas, Darko Zivanovic, Sergi Moré, Javier Bartolome, David Vicente, Marc Casas, Paul M. Carpenter, Petar Radojkovic, Eduard Ayguadé. 61 [doi]
- Task bench: a parameterized benchmark for evaluating parallel runtime performanceElliott Slaughter, Wei Wu, Yuankun Fu, Legend Brandenburg, Nicolai Garcia, Wilhem Kautz, Emily Marx, Kaleb S. Morris, Qinglei Cao, George Bosilca, Seema Mirchandaney, Wonchan Lee, Sean Treichler, Patrick S. McCormick, Alex Aiken. 62 [doi]
- Smart-PGSim: using neural network to accelerate AC-OPF power grid simulationWenqian Dong, Zhen Xie, Gokcen Kestor, Dong Li. 63 [doi]
- SEFEE: lightweight storage error forecasting in large-scale enterprise storage systemsAmirhessam Yazdi, Xing Lin, Lei Yang, Feng Yan 0001. 64 [doi]
- Live forensics for HPC systems: a case study on distributed storage systemsSaurabh Jha, Shengkun Cui, Subho S. Banerjee, Tianyin Xu, Jeremy Enos, Mike Showerman, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer. 65 [doi]
- INEC: fast and coherent in-network erasure codingHaiyang Shi, Xiaoyi Lu. 66 [doi]
- Waiting game: optimally provisioning fixed resources for cloud-enabled schedulersPradeep Ambati, Noman Bashir, David E. Irwin 0001, Prashant J. Shenoy. 67 [doi]
- Metis: learning to schedule long-running applications in shared container clusters at scaleLuPing Wang, Qizhen Weng, Wei Wang 0030, Chen Chen 0015, Bo Li 0001. 68 [doi]
- Batch: machine learning inference serving on serverless platforms with adaptive batchingAhsan Ali, Riccardo Pinciroli, Feng Yan 0001, Evgenia Smirni. 69 [doi]
- Reducing communication in graph neural network trainingAlok Tripathy, Katherine A. Yelick, Aydin Buluç. 70 [doi]
- FeatGraph: a flexible and efficient backend for graph neural network systemsYuwei Hu, Zihao Ye, Minjie Wang, Jiali Yu, Da Zheng, Mu Li, Zheng Zhang 0001, Zhiru Zhang, Yida Wang. 71 [doi]
- GE-SpMM: general-purpose sparse matrix-matrix multiplication on GPUs for graph neural networksGuyue Huang, Guohao Dai, Yu Wang 0002, Huazhong Yang. 72 [doi]
- and quadrature-free discontinuous galerkin algorithms for (plasma) kinetic equationsAmmar Hakim, James Juno. 73 [doi]
- Distributed-memory parallel symmetric nonnegative matrix factorizationSrinivas Eswar, Koby Hayashi, Grey Ballard, Ramakrishnan Kannan, Richard W. Vuduc, Haesun Park. 74 [doi]
- Distributed many-to-many protein sequence alignment using sparse matricesOguz Selvitopi, Saliya Ekanayake, Giulia Guidi, Georgios A. Pavlopoulos, Ariful Azad, Aydin Buluç. 75 [doi]
- Runtime-guided ECC protection using online estimation of memory vulnerabilityLuc Jaulmes, Miquel Moretó, Mateo Valero, Mattan Erez, Marc Casas. 76 [doi]
- CRAC: checkpoint-restart architecture for CUDA with streams and UVMTwinkle Jain, Gene Cooperman. 77 [doi]
- ANT-man: towards agile power management in the microservice eraXiaofeng Hou, Chao Li, Jiacheng Liu 0001, Lu Zhang 0049, Yang Hu 0001, Minyi Guo. 78 [doi]
- Scalable heterogeneous execution of a coupled-cluster model with perturbative triplesJinsung Kim, Ajay Panyala, Bo Peng, Karol Kowalski, P. Sadayappan, Sriram Krishnamoorthy. 79 [doi]
- A submatrix-based method for approximate matrix function evaluation in the quantum chemistry code CP2KMichael Lass, Robert Schade, Thomas D. Kühne, Christian Plessl. 80 [doi]
- Scaling the hartree-fock matrix build on summitGiuseppe M. J. Barca, David L. Poole, Jorge L. Galvez Vallejo, Melisa Alkan, Colleen Bertoni, Alistair P. Rendell, Mark S. Gordon. 81 [doi]
- MoHA: a composable system for efficient in-situ analytics on heterogeneous HPC systemsHaoyuan Xing, Gagan Agrawal, Rajiv Ramnath. 82 [doi]
- Foresight: analysis that matters for data reductionPascal Grosset, Christopher M. Biwer, Jesus Pulido, Arvind T. Mohan, Ayan Biswas, John Patchett, Terece L. Turton, David H. Rogers, Daniel Livescu, James P. Ahrens. 83 [doi]
- Job characteristics on large-scale systems: long-term analysis, quantification, and implicationsTirthak Patel, Zhengchun Liu, Raj Kettimuthu, Paul Rich, William E. Allcock, Devesh Tiwari. 84 [doi]
- Pencil: a pipelined algorithm for distributed stencilsHengjie Wang, Aparna Chandramowlishwaran. 85 [doi]
- Speeding up SpMV for power-law graph analytics by enhancing locality & vectorizationSerif Yesil, Azin Heidarshenas, Adam Morrison 0001, Josep Torrellas. 86 [doi]
- Efficient tiled sparse matrix multiplication through matrix signaturesSüreyya Emre Kurt, Aravind Sukumaran-Rajam, Fabrice Rastello, P. Sadayyapan. 87 [doi]
- GPU-trident: efficient modeling of error propagation in GPU programsAbdul Rehman Anwer, Guanpeng Li, Karthik Pattabiraman, Michael B. Sullivan 0001, Timothy Tsai 0002, Siva Kumar Sastry Hari. 88 [doi]
- GVProf: a value profiler for GPU-based clustersKeren Zhou, Yueming Hao, John M. Mellor-Crummey, Xiaozhu Meng, Xu Liu. 89 [doi]
- An efficient and non-intrusive GPU scheduling framework for deep learning training systemsShaoqi Wang, Oscar J. Gonzalez, Xiaobo Zhou 0002, Thomas Williams, Brian D. Friedman, Martin Havemann, Thomas Y. C. Woo. 90 [doi]
- Preparing nuclear astrophysics for exascaleMax P. Katz, Ann S. Almgren, Maria Barrios Sazo, Kiran Eiden, Kevin Gott, Alice Harpole, Jean M. Sexton, Donald E. Willcox, Weiqun Zhang, Michael Zingale. 91 [doi]
- A performance-portable nonhydrostatic atmospheric dycore for the energy exascale earth system model running at cloud-resolving resolutionsLuca Bertagna, Oksana Guba, Mark A. Taylor, James G. Foucar, Jeff Larkin, Andrew M. Bradley, Sivasankaran Rajamanickam, Andrew G. Salinger. 92 [doi]
- Acceleration of fusion plasma turbulence simulations using the mixed-precision communication-avoiding krylov methodYasuhiro Idomura, Takuya Ina, Yussuf Ali, Toshiyuki Imamura. 93 [doi]
- Convolutional neural network training with distributed K-FACJ. Gregory Pauloski, Zhao Zhang 0007, Lei Huang, Weijia Xu, Ian T. Foster. 94 [doi]
- BiQGEMM: matrix multiplication with lookup table for binary-coding-based quantized DNNsYongkweon Jeon, Baeseong Park, Se Jung Kwon, Byeongwook Kim, Jeongin Yun, Dongsoo Lee. 95 [doi]
- Term quantization: furthering quantization at run timeHsiang-Tsung Kung 0001, Bradley McDanel, Sai Qian Zhang. 96 [doi]
- Compiling generalized histograms for GPUTroels Henriksen, Sune Hellfritzsch, Ponnuswamy Sadayappan, Cosmin E. Oancea. 97 [doi]
- CCAMP: an integrated translation and optimization framework for OpenACC and OpenMPJacob Lambert, Seyong Lee, Jeffrey S. Vetter, Allen D. Malony. 98 [doi]
- High-performance parallel graph coloring with strong guarantees on work, depth, and qualityMaciej Besta, Armon Carigiet, Kacper Janda, Zur Vonarburg-Shmaria, Lukas Gianinazzi, Torsten Hoefler. 99 [doi]
- GraphPi: high performance graph pattern matching through effective redundancy eliminationTianhui Shi, Mingshu Zhai, Yi Xu, Jidong Zhai. 100 [doi]
- Rocket: efficient and scalable all-pairs computations on heterogeneous platformsStijn Heldens, Pieter Hijma, Ben van Werkhoven, Jason Maassen, Henri E. Bal, Rob van Nieuwpoort. 101 [doi]