Abstract is missing.
- Anton 3: twenty microseconds of molecular dynamics simulation before lunchDavid E. Shaw, Peter J. Adams, Asaph Azaria, Joseph A. Bank, Brannon Batson, Alistair Bell, Michael Bergdorf, Jhanvi Bhatt, J. Adam Butts, Timothy Correia, Robert M. Dirks, Ron O. Dror, Michael P. Eastwood, Bruce Edwards, Amos Even, Peter Feldmann, Michael Fenn, Christopher H. Fenton, Anthony Forte, Joseph Gagliardo, Gennette Gill, Maria Gorlatova, Brian Greskamp, J. P. Grossman, Justin Gullingsrud, Anissa Harper, William Hasenplaugh, Mark Heily, Benjamin Colin Heshmat, Jeremy Hunt, Douglas J. Ierardi, Lev Iserovich, Bryan L. Jackson, Nick P. Johnson, Mollie M. Kirk, John L. Klepeis, Jeffrey S. Kuskin, Kenneth M. Mackenzie, Roy J. Mader, Richard McGowen, Adam McLaughlin, Mark A. Moraes, Mohamed H. Nasr, Lawrence J. Nociolo, Lief O'Donnell, Andrew Parker, Jon L. Peticolas, Goran Pocina, Cristian Predescu, Terry Quan, John K. Salmon, Carl Schwink, Keun Sup Shim, Naseer Siddique, Jochen Spengler, Tamas Szalay, Raymond Tabladillo, Reinhard Tartler, Andrew G. Taube, Michael Theobald, Brian Towles, William Vick, Stanley C. Wang, Michael Wazlowski, Madeleine J. Weingarten, John M. Williams, Kevin A. Yuh. 1 [doi]
- Symplectic structure-preserving particle-in-cell whole-volume simulation of tokamak plasmas to 111.3 trillion particles and 25.7 billion gridsJianyuan Xiao, Junshi Chen, Jiangshan Zheng, Hong An, Shenghong Huang, Chao Yang, Fang Li, Ziyu Zhang, Yeqi Huang, Wenting Han, Xin Liu, Dexun Chen, Zixi Liu, Ge Zhuang, Jiale Chen, Guoqiang Li, Xuan Sun, Qiang Chen. 2 [doi]
- Closing the "quantum supremacy" gap: achieving real-time simulation of a random quantum circuit using a new Sunway supercomputerYong (Alexander) Liu, Xin (Lucy) Liu, Fang (Nancy) Li, Haohuan Fu, Yuling Yang, Jiawei Song, Pengpeng Zhao, Zhen Wang, Dajia Peng, Huarong Chen, Chu Guo, Heliang Huang, Wenzhao Wu, Dexun Chen. 3 [doi]
- Billion atom molecular dynamics simulations of carbon at extreme conditions and experimental time and length scalesKien Nguyen-Cong, Jonathan T. Willman, Stan G. Moore, Anatoly B. Belonoshko, Rahulkumar Gayatri, Evan Weinberg, Mitchell A. Wood, Aidan P. Thompson, Ivan I. Oleynik. 4 [doi]
- A 400 trillion-grid Vlasov simulation on Fugaku supercomputer: large-scale distribution of cosmic relic neutrinos in a six-dimensional phase spaceKohji Yoshikawa, Satoshi Tanaka, Naoki Yoshida. 5 [doi]
- ab initio quantum raman spectra simulations on the leadership HPC system in ChinaHonghui Shang, Fang Li, Yunquan Zhang, Libo Zhang, You Fu, Yingxiang Gao, Yangjun Wu, Xiaohui Duan, Rongfen Lin, Xin Liu, Ying Liu, Dexun Chen. 6 [doi]
- de novo metagenome assembly using GPUsMuaaz Gul Awan, Steven A. Hofmeyr, Rob Egan, Nan Ding, Aydin Buluç, Jack Deslippe, Leonid Oliker, Katherine A. Yelick. 7 [doi]
- FastZ: accelerating gapped whole genome alignment on GPUsSree Charan Gundabolu, T. N. Vijaykumar, Mithuna Thottethodi. 8 [doi]
- Scalable FBP decomposition for cone-beam CT reconstructionPeng Chen, Mohamed Wahib, Xiao Wang, Takahiro Hirofuchi, Hirotaka Ogawa, Ander Biguri, Richard Boardman, Thomas Blumensath, Satoshi Matsuoka. 9 [doi]
- Generalizable coordination of large multiscale workflows: challenges and learnings at scaleHarsh Bhatia, Francesco di Natale, Joseph Y. Moon, Xiaohua Zhang, Joseph R. Chavez, Fikret Aydin, Chris Stanley, Tomas Oppelstrup, Chris Neale, Sara Kokkila Schumacher, Dong H. Ahn, Stephen Herbein, Timothy S. Carpenter, Sandrasegaram Gnanakaran, Peer-Timo Bremer, James N. Glosli, Felice C. Lightstone, Helgi I. Ingólfsson. 10 [doi]
- Linux vs. lightweight multi-kernels for high performance computing: experiences at pre-exascaleBalazs Gerofi, Kohei Tarumizu, Lei Zhang, Takayuki Okamoto, Masamichi Takagi, Shinji Sumimoto, Yutaka Ishikawa. 11 [doi]
- Revealing power, energy and thermal dynamics of a 200PF pre-exascale supercomputerWoong Shin, Vladyslav Oles, Ahmad Maroof Karimi, J. Austin Ellis, Feiyi Wang. 12 [doi]
- KAISA: an adaptive second-order optimizer framework for deep neural networksJ. Gregory Pauloski, Qi Huang, Lei Huang, Shivaram Venkataraman, Kyle Chard, Ian T. Foster, Zhao Zhang 0007. 13 [doi]
- Tensor processing primitives: a programming abstraction for efficiency and portability in deep learning workloadsEvangelos Georganas, Dhiraj D. Kalamkar, Sasikanth Avancha, Menachem Adelman, Cristina Anderson, Alexander Breuer, Jeremy Bruestle, Narendra Chaudhary, Abhisek Kundu, Denise Kutnick, Frank Laub, Vasimuddin Md, Sanchit Misra, Ramanarayan Mohanty, Hans Pabst, Barukh Ziv, Alexander Heinecke. 14 [doi]
- Enable simultaneous DNN services based on deterministic operator overlap and precise latency predictionWeihao Cui, Han Zhao 0005, Quan Chen, Ningxin Zheng, Jingwen Leng, Jieru Zhao, Zhuo Song, Tao Ma, Yong Yang, Chao Li, Minyi Guo. 15 [doi]
- Distributed quantum computing with QMPIThomas Häner, Damian S. Steiger, Torsten Hoefler, Matthias Troyer. 16 [doi]
- BAASH: lightweight, efficient, and reliable blockchain-as-a-service for HPC systemsAbdullah Al Mamun, Feng Yan, Dongfang Zhao. 17 [doi]
- Representation of women in HPC conferencesEitan Frachtenberg, Rhody D. Kaner. 18 [doi]
- Preparing an incompressible-flow fluid dynamics code for exascale-class wind energy simulationsPaul Mullowney, Ruipeng Li, Stephen J. Thomas, Shreyas Ananthan, Ashesh Sharma, Jon S. Rood, Alan B. Williams, Michael A. Sprague. 19 [doi]
- Scalable adaptive PDE solvers in arbitrary domainsKumar Saurabh, Masado Ishii, Milinda Fernando, Boshun Gao, Kendrick Tan, Ming-Chen Hsu, Adarsh Krishnamurthy, Hari Sundar, Baskar Ganapathysubramanian. 20 [doi]
- A next-generation discontinuous galerkin fluid dynamics solver with application to high-resolution lung airflow simulationsMartin Kronbichler 0002, Niklas Fehn, Peter Munch, Maximilian Bergbauer, Karl-Robert Wichmann, Carolin Geitner, Momme Allalen, Martin Schulz, Wolfgang A. Wall. 21 [doi]
- Understanding, predicting and scheduling serverless workloads under partial interferenceLaiping Zhao, Yanan Yang, Yiming Li, Xian Zhou, Keqiu Li. 22 [doi]
- The hidden cost of the edge: a performance comparison of edge and cloud latenciesAhmed Ali-Eldin, Bin Wang, Prashant J. Shenoy. 23 [doi]
- RIBBON: cost-effective and qos-aware deep learning model inference using a diverse pool of cloud computing instancesBaolin Li, Rohan Basu Roy, Tirthak Patel, Vijay Gadepally, Karen Gettings, Devesh Tiwari. 24 [doi]
- E.T.: re-thinking self-attention for transformer models on GPUsShiyang Chen, Shaoyi Huang, Santosh Pandey, Bingbing Li, Guang R. Gao, Long Zheng 0001, Caiwen Ding, Hang Liu 0001. 25 [doi]
- Parallel construction of module networksAnkit Srivastava, Sriram P. Chockalingam, Maneesha Aluru, Srinivas Aluru. 26 [doi]
- Chimera: efficiently training large-scale neural networks with bidirectional pipelinesShigang Li 0002, Torsten Hoefler. 27 [doi]
- Bootstrapping in-situ workflow auto-tuning via combining performance models of component applicationsTong Shu, Yanfei Guo, Justin M. Wozniak, Xiaoning Ding, Ian T. Foster, Tahsin M. Kurç. 28 [doi]
- Meeting the real-time challenges of ground-based telescopes using low-rank matrix computationsHatem Ltaief, Jesse Cranney, Damien Gratadour, Yuxi Hong, Laurent Gatineau, David E. Keyes. 29 [doi]
- AgEBO-tabular: joint neural architecture and hyperparameter search with autotuned data-parallel training for tabular dataRomain Égelé, Prasanna Balaprakash, Isabelle Guyon, Venkatram Vishwanath, Fangfang Xia, Rick Stevens, Zhengying Liu. 30 [doi]
- Non-recurring engineering (NRE) best practices: a case study with the NERSC/NVIDIA OpenMP contractChristopher S. Daley, Annemarie Southwell, Rahulkumar Gayatri, Scott Biersdorfff, Craig Toepfer, Güray Özen, Nicholas J. Wright. 31 [doi]
- Minimizing privilege for building HPC containersReid Priedhorsky, Shane Richard Canon, Timothy Randles, Andrew J. Younge. 32 [doi]
- Systematically inferring I/O performance variability by examining repetitive job behaviorEmily Costa, Tirthak Patel, Benjamin Schwaller, Jim M. Brandt, Devesh Tiwari. 33 [doi]
- SEEC: stochastic escape express channelMayank Parasar, Natalie D. Enright Jerger, Paul V. Gratz, Joshua San Miguel, Tushar Krishna. 34 [doi]
- Flare: flexible in-network allreduceDaniele De Sensi, Salvatore Di Girolamo, Saleh Ashkboos, Shigang Li 0002, Torsten Hoefler. 35 [doi]
- HatRPC: hint-accelerated thrift RPC over RDMATianxi Li, Haiyang Shi, Xiaoyi Lu. 36 [doi]
- APNN-TC: accelerating arbitrary precision neural networks on ampere GPU tensor coresBoyuan Feng, Yuke Wang, Tong Geng, Ang Li, Yufei Ding. 37 [doi]
- Scalable edge-based hyperdimensional learning system with brain-like neural adaptationZhuowen Zou, Yeseong Kim, Farhad Imani, Haleh Alimohamadi, Rosario Cammarota, Mohsen Imani. 38 [doi]
- Dr. Top-k: delegate-centric Top-k on GPUsAnil Gaihre, Da Zheng, Scott Weitze, Lingda Li, Shuaiwen Leon Song, Caiwen Ding, Xiaoye S. Li, Hang Liu. 39 [doi]
- Enabling large-scale correlated electronic structure calculations: scaling the RI-MP2 method on summitGiuseppe M. J. Barca, Jorge L. Galvez Vallejo, David L. Poole, Melisa Alkan, Ryan Stocks, Alistair P. Rendell, Mark S. Gordon. 40 [doi]
- ab initio simulation of raman spectra for biological systemsHonghui Shang, Fang Li, Yunquan Zhang, Ying Liu, Libo Zhang, Mingchuan Wu, Yangjun Wu, Di Wei, Huimin Cui, Xin Liu, Fei Wang, Yuxi Ye, Yingxiang Gao, Shuang Ni, Xin Chen, Dexun Chen. 41 [doi]
- LMFF: efficient and scalable layered materials force field on heterogeneous many-core processorsPing Gao 0005, Xiaohui Duan, Jiaxu Guo, Jin Wang, Zhenya Song, LiZhen Cui, Xiangxu Meng, Xin Liu, Wusheng Zhang, Ming Ma, Guohui Li, Dexun Chen, Haohuan Fu, Wei Xue, Weiguo Liu, Guangwen Yang. 42 [doi]
- Hardware acceleration of tensor-structured multilevel ewald summation method on MDGRAPE-4A, a special-purpose computer system for molecular dynamics simulationsGentaro Morimoto, Yohei M. Koyama, Hao Zhang, Teruhisa S. Komatsu, Yousuke Ohno, Keigo Nishida, Itta Ohmura, Hiroshi Koyama, Makoto Taiji. 43 [doi]
- Accelerating bandwidth-bound deep learning inference with main-memory acceleratorsBenjamin Y. Cho, Jeageun Jung, Mattan Erez. 44 [doi]
- LCCG: a locality-centric hardware accelerator for high throughput of concurrent graph processingJin Zhao 0003, Yu Zhang 0027, Xiaofei Liao, Ligang He, Bingsheng He, Hai Jin 0001, Haikun Liu. 45 [doi]
- Simurgh: a fully decentralized and secure NVMM user space file systemNafiseh Moti, Frederic Schimmelpfennig, Reza Salkhordeh, David Klopp, Toni Cortes, Ulrich Rückert 0001, André Brinkmann. 46 [doi]
- Lunule: an agile and judicious metadata load balancer for CephFSYiduo Wang, Cheng Li, Xinyang Shao, Youxu Chen, Feng Yan 0001, Yinlong Xu. 47 [doi]
- DeltaFS: a scalable no-ground-truth filesystem for massively-parallel computingQing Zheng, Charles D. Cranor, Gregory R. Ganger, Garth A. Gibson, George Amvrosiadis, Bradley W. Settlemyer, Gary A. Grider. 48 [doi]
- Distributed multigrid neural solvers on megavoxel domainsAditya Balu, Sergio Botelho, Biswajit Khara, Vinay Rao, Soumik Sarkar, Chinmay Hegde, Adarsh Krishnamurthy, Santi Adavani, Baskar Ganapathysubramanian. 49 [doi]
- EIGA: elastic and scalable dynamic graph analysisKasimir Gabert, Kaan Sancak, M. Yusuf Özkaya, Ali Pinar, Ümit V. Çatalyürek. 50 [doi]
- Krill: a compiler and runtime system for concurrent graph processingHongzheng Chen, Minghua Shen, Nong Xiao, Yutong Lu. 51 [doi]
- Pilgrim: scalable and (near) lossless MPI tracingChen Wang 0004, Pavan Balaji, Marc Snir. 52 [doi]
- Hybrid, scalable, trace-driven performance modeling of GPGPUsYehia Arafa, Abdel-Hameed A. Badawy, Ammar ElWazir, Atanu Barai, Ali Eker, Gopinath Chennupati, Nandakishore Santhi, Stephan J. Eidenbenz. 53 [doi]
- G-SEPM: building an accurate and efficient soft error prediction model for GPGPUsHengshan Yue, Xiaohui Wei, Guangli Li, Jianpeng Zhao, Nan Jiang, Jingweijia Tan. 54 [doi]
- Single-node partitioned-memory for huge graph analytics: cost and performance trade-offsSayan Ghosh, Nathan R. Tallent, Marco Minutoli, Mahantesh Halappanavar, Ramesh Peri, Ananth Kalyanaraman. 55 [doi]
- Accelerating applications using edge tensor processing unitsKuan-Chieh Hsu, Hung-Wei Tseng 0001. 56 [doi]
- Enabling and scaling the HPCG benchmark on the newest generation Sunway supercomputer with 42 million heterogeneous coresQianchao Zhu, Hao Luo, Chao Yang, Mingshuo Ding, Wanwang Yin, Xinhui Yuan. 57 [doi]
- Efficient large-scale language model training on GPU clusters using megatron-LMDeepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia. 58 [doi]
- ZeRO-infinity: breaking the GPU memory wall for extreme scale deep learningSamyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, Shaden Smith, Yuxiong He. 59 [doi]
- FedAT: a high-performance and communication-efficient federated learning system with asynchronous tiersZheng Chai, Yujing Chen, Ali Anwar 0001, Liang Zhao 0002, Yue Cheng, Huzefa Rangwala. 60 [doi]
- Reverse-mode automatic differentiation and optimization of GPU kernels via enzymeWilliam S. Moses, Valentin Churavy, Ludger Paehler, Jan Hückelheim, Sri Hari Krishna Narayanan, Michel Schanen, Johannes Doerfert. 61 [doi]
- Overcoming barriers to scalability in variational quantum Monte CarloTianchen Zhao, Saibal De, Brian Chen, James Stokes, Shravan Veerapaneni. 62 [doi]
- 3D acoustic-elastic coupling with gravity: the dynamics of the 2018 palu, sulawesi earthquake and tsunamiLukas Krenz, Carsten Uphoff, Thomas Ulrich, Alice-Agnes Gabriel, Lauren S. Abrahams, Eric M. Dunham, Michael Bader. 63 [doi]
- In-depth analyses of unified virtual memory system for GPU accelerated computingTyler Allen, Rong Ge 0002. 64 [doi]
- Paths to OpenMP in the kernelJiacheng Ma, Wenyi Wang, Aaron Nelson, Michael Cuevas, Brian Homerding, Conghao Liu, Zhen Huang, Simone Campanoni, Kyle C. Hale, Peter A. Dinda. 65 [doi]
- Index launches: scalable, flexible representation of parallel task groupsRupanshu Soi, Michael Bauer, Sean Treichler, Manolis Papadakis, Wonchan Lee, Patrick S. McCormick, Alex Aiken, Elliott Slaughter. 66 [doi]
- TriPoll: computing surveys of triangles in massive-scale temporal graphs with metadataTrevor Steil, Tahsin Reza, Keita Iwabuchi, Benjamin W. Priest, Geoffrey Sanders, Roger Pearce. 67 [doi]
- Discovering and balancing fundamental cycles in large signed graphsGhadeer Alabandi, Jelena Tesic, Lucas Rusnak, Martin Burtscher. 68 [doi]
- cuTS: scaling subgraph isomorphism on distributed multi-GPU systems using trie based data structureLizhi Xiang, Arif Khan, Edoardo Serra, Mahantesh Halappanavar, Aravind Sukumaran-Rajam. 69 [doi]
- On the parallel I/O optimality of linear algebra kernels: near-optimal matrix factorizationsGrzegorz Kwasniewski, Marko Kabic, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Jens Eirik Saethre, André Gaillard, Timo Schneider, Maciej Besta, Anton Kozhevnikov, Joost VandeVondele, Torsten Hoefler. 70 [doi]
- STM-multifrontal QR: streaming task mapping multifrontal QR factorization empowered by GCNShengle Lin, Wangdong Yang, Haotian Wang, Qinyun Tsai, Kenli Li 0001. 71 [doi]
- LIBSHALOM: optimizing small and irregular-shaped matrix multiplications on ARMv8 multi-coresWeiling Yang, Jianbin Fang, Dezun Dong, Xing Su, Zheng Wang. 72 [doi]
- TensorKMC: kinetic Monte Carlo simulation of 50 trillion atoms driven by deep learning on a new generation of Sunway supercomputerHonghui Shang, Xin Chen, Xingyu Gao, Rongfen Lin, Lifang Wang, Fang Li, Qian Xiao, Lei Xu 0023, Qiang Sun, Leilei Zhu, Fei Wang, Yunquan Zhang, Haifeng Song 0003. 73 [doi]
- High-throughput virtual screening of small molecule inhibitors for SARS-CoV-2 protein targets with deep fusion modelsGarrett A. Stevenson, Derek Jones, HyoJin Kim, W. F. Drew Bennett, Brian J. Bennion, Monica Borucki, Feliza Bourguet, Aidan Epstein, Magdalena Franco, Brooke Harmon, Stewart He, Max P. Katz, Daniel A. Kirshner, Victoria Lao, Edmond Y. Lau, Jacky Lo, Kevin McLoughlin, Richard Mosesso, Deepa K. Murugesh, Oscar A. Negrete, Edwin A. Saada, Brent Segelke, Maxwell Stefan, Marisa W. Torres, Dina Weilhammer, Sergio E. Wong, Yue Yang, Adam T. Zemla, Xiaohua Zhang, Fangqiang Zhu, Felice C. Lightstone, Jonathan E. Allen. 74 [doi]
- High performance uncertainty quantification with parallelized multilevel Markov chain Monte CarloLinus Seelinger, Anne Reinarz, Leonhard Rannabauer, Michael Bader, Peter Bastian, Robert Scheichl. 75 [doi]
- DistGNN: scalable distributed training for large-scale graph neural networksVasimuddin Md, Sanchit Misra, Guixiang Ma, Ramanarayan Mohanty, Evangelos Georganas, Alexander Heinecke, Dhiraj D. Kalamkar, Nesreen K. Ahmed, Sasikanth Avancha. 76 [doi]
- Efficient scaling of dynamic graph neural networksVenkatesan T. Chakaravarthy, Shivmaran S. Pandian, Saurabh Raje, Yogish Sabharwal, Toyotaro Suzumura, Shashanka Ubaru. 77 [doi]
- Efficient tensor core-based GPU kernels for structured sparsity under reduced precisionZhaodong Chen, Zheng Qu, Liu Liu, Yufei Ding, Yuan Xie. 78 [doi]
- Arithmetic-intensity-guided fault tolerance for neural network inference on GPUsJack Kosaian, K. V. Rashmi. 79 [doi]
- PEPPA-X: finding program test inputs to bound silent data corruption vulnerability in HPC applicationsMd Hasanur Rahman, Aabid Shamji, Shengjian Guo, Guanpeng Li. 80 [doi]
- Cuttlefish: library for achieving energy efficiency in multicore parallel programsSunil Kumar, Akshat Gupta, Vivek Kumar, Sridutt Bhalachandra. 81 [doi]
- Temporal vectorization for stencilsLiang Yuan, Hang Cao, Yunquan Zhang, Kun Li, Pengqi Lu, Yue Yue. 82 [doi]
- PAGANI: a parallel adaptive GPU algorithm for numerical integrationIoannis Sakiotis, Kamesh Arumugam, Marc Paterno, Desh Ranjan, Balsa Terzic, Mohammad Zubair. 83 [doi]
- Reducing redundancy in data organization and arithmetic calculation for stencil computationsKun Li, Liang Yuan, Yunquan Zhang, Yue Yue. 84 [doi]
- CAKE: matrix multiplication using constant-bandwidth blocksH. T. Kung 0001, Vikas Natesh, Andrew Sabot. 85 [doi]
- HPAC: evaluating approximate computing techniques on HPC OpenMP applicationsKonstantinos Parasyris, Giorgis Georgakoudis, Harshitha Menon, James Diffenderfer, Ignacio Laguna, Daniel Osei-Kuffuor, Markus Schordan. 86 [doi]
- Accelerating XOR-based erasure coding using program optimization techniquesYuya Uezato. 87 [doi]
- Error-controlled, progressive, and adaptable retrieval of scientific data with multilevel decompositionXin Liang 0001, Qian Gong, Jieyang Chen, Ben Whitney, Lipeng Wan, Qing Liu 0002, David Pugmire, Rick Archibald, Norbert Podhorszki, Scott Klasky. 88 [doi]
- LogECMem: coupling erasure-coded in-memory key-value stores with parity loggingLiangfeng Cheng, Yuchong Hu, Zhaokang Ke, Jia Xu, Qiaori Yao, Dan Feng 0001, Weichun Wang, Wei Chen. 89 [doi]
- Online optimization of file transfers in high-speed networksMd. Arifuzzaman, Engin Arslan. 90 [doi]
- Hardware-supported remote persistence for distributed persistent memoryZhuohui Duan, Haodi Lu, Haikun Liu, Xiaofei Liao, Hai Jin 0001, Yu Zhang, Song Wu 0001. 91 [doi]
- Clairvoyant prefetching for distributed machine learning I/ONikoli Dryden, Roman Böhringer, Tal Ben-Nun, Torsten Hoefler. 92 [doi]
- ndzip-gpu: efficient lossless compression of scientific floating-point data on GPUsFabian Knorr, Peter Thoman, Thomas Fahringer. 93 [doi]
- Resilient error-bounded lossy compressor for data transferSihuan Li, Sheng Di, Kai Zhao, Xin Liang 0001, Zizhong Chen, Franck Cappello. 94 [doi]
- Productivity, portability, performance: data-centric PythonAlexandros Nikolaos Ziogas, Timo Schneider, Tal Ben-Nun, Alexandru Calotoiu, Tiziano De Matteis, Johannes de Fine Licht, Luca Lavarini, Torsten Hoefler. 95 [doi]
- Empirical evaluation of circuit approximations on noisy quantum devicesEllis Wilson, Frank Mueller 0001, Lindsay Bassman, Costin Iancu. 96 [doi]
- SV-sim: scalable PGAS-based state vector simulation of quantum circuitsAng Li, Bo Fang, Christopher E. Granade, Guen Prawiroatmodjo, Bettina Heim, Martin Roetteler, Sriram Krishnamoorthy. 97 [doi]
- SW_Qsim: a minimize-memory quantum simulator with high-performance on a new Sunway supercomputerFang Li, Xin Liu, Yong Liu, Pengpeng Zhao, Yuling Yang, Honghui Shang, Weizhe Sun, Zhen Wang, Enming Dong, Dexun Chen. 98 [doi]
- MAPA: multi-accelerator pattern allocation policy for multi-tenant GPU serversKiran Ranganath, Joshua D. Suetterlein, Joseph B. Manzano, Shuaiwen Leon Song, Daniel Wong 0001. 99 [doi]
- Online evolutionary batch size orchestration for scheduling deep learning workloads in GPU clustersZhengda Bian, Shenggui Li, Wei Wang, Yang You. 100 [doi]
- Whale: efficient one-to-many data partitioning in RDMA-assisted distributed stream processing systemsJie Tan, Hanhua Chen, Yonghui Wang, Hai Jin 0001. 101 [doi]
- Exploiting user activeness for data retention in HPC systemsWei Zhang, Suren Byna, Hyogi Sim, SangKeun Lee, Sudharshan Vazhkudai, Yong Chen. 102 [doi]
- Pinpointing crash-consistency bugs in the HPC I/O stack: a cross-layer approachJinghan Sun, Jian Huang, Marc Snir. 103 [doi]
- Characterization and prediction of deep learning workloads in large-scale GPU datacentersQinghao Hu, Peng Sun 0006, Shengen Yan, Yonggang Wen 0001, Tianwei Zhang 0004. 104 [doi]