Abstract is missing.
- Streaming message interface: high-performance distributed memory programming on reconfigurable hardwareTiziano De Matteis, Johannes de Fine Licht, Jakub Beránek, Torsten Hoefler. [doi]
- Legate NumPy: accelerated and distributed array computingMichael Bauer, Michael Garland. [doi]
- Compiler assisted hybrid implicit and explicit GPU memory management under unified address spaceLingda Li, Barbara M. Chapman. [doi]
- iFDK: a scalable framework for instant high-resolution image reconstructionPeng Chen, Mohamed Wahib, Shin'ichiro Takizawa, Ryousei Takano, Satoshi Matsuoka. [doi]
- GPCNeT: designing a benchmark suite for inducing and measuring contention in HPC networksSudheer Chunduri, Taylor Groves, Peter Mendygral, Brian Austin, Jacob Balma, Krishna Kandalla, Kalyan Kumaran, Glenn K. Lockwood, Scott Parker, Steven Warren, Nathan Wichmann, Nicholas J. Wright. [doi]
- MIQS: metadata indexing and querying service for self-describing file formatsWei Zhang 0097, Suren Byna, Houjun Tang, Brody Williams, Yong Chen 0001. [doi]
- Regularizing irregularly sparse point-to-point communicationsR. Oguz Selvitopi, Cevdet Aykanat. [doi]
- Consensus equilibrium framework for super-resolution and extreme-scale CT reconstructionXiao Wang 0004, Venkatesh Sridhar, Zahra Ronaghi, Rollin C. Thomas, Jack Deslippe, Dilworth Parkinson, Gregery T. Buzzard, Samuel P. Midkiff, Charles A. Bouman, Simon K. Warfield. [doi]
- A massively parallel infrastructure for adaptive multiscale simulations: modeling RAS initiation pathway for cancerFrancesco di Natale, Harsh Bhatia, Timothy S. Carpenter, Chris Neale, Sara Kokkila Schumacher, Tomas Oppelstrup, Liam Stanton, Xiaohua Zhang, Shiv Sundram, Thomas R. W. Scogland, Gautham Dharuman, Michael P. Surh, Yue Yang, Claudia Misale, Lars Schneidenbach, Carlos Costa, Changhoan Kim, Bruce D'Amora, Sandrasegaram Gnanakaran, Dwight V. Nissley, Frederick H. Streitz, Felice C. Lightstone, Peer-Timo Bremer, James N. Glosli, Helgi I. Ingólfsson. [doi]
- Pinpointing performance inefficiencies via lightweight variance profilingPengfei Su, Shuyin Jiao, Milind Chabbi, Xu Liu 0001. [doi]
- Bandwidth steering in HPC using silicon nanophotonicsGeorge Michelogiannakis, Yiwen Shen, Min Yee Teh, Xiang Meng, Benjamin Aivazi, Taylor Groves, John Shalf, Madeleine Glick, Manya Ghobadi, Larry Dennison, Keren Bergman. [doi]
- A versatile software systolic execution model for GPU memory-bound kernelsPeng Chen, Mohamed Wahib, Shin'ichiro Takizawa, Ryousei Takano, Satoshi Matsuoka. [doi]
- PoDD: power-capping dependent distributed applicationsHuazhe Zhang, Henry Hoffmann. [doi]
- Preparation and optimization of a diverse workload for a large-scale heterogeneous systemIan Karlin, Yoonho Park, Bronis R. de Supinski, Peng Wang, Bert Still, David Beckingsale, Robert Blake, Tong Chen, Guojing Cong, Carlos H. A. Costa, Johann Dahm, Giacomo Domeniconi, Thomas Epperly, Aaron Fisher, Sara Kokkila Schumacher, Steven H. Langer, Hai Le, Eun-Kyung Lee, Naoya Maruyama, Xinyu Que, David F. Richards, Björn Sjögreen, Jonathan Wong, Carol S. Woodward, Ulrike Meier Yang, Xiaohua Zhang, Bob Anderson, David Appelhans, Levi Barnes, Peter D. Barnes Jr., Sorin Bastea, David Böhme, Jamie A. Bramwell, James M. Brase, José R. Brunheroto, Barry Chen, Charway R. Cooper, Tony Degroot, Robert D. Falgout, Todd Gamblin, David J. Gardner, James N. Glosli, John A. Gunnels, Max Katz, Tzanio V. Kolev, I-Feng W. Kuo, Matthew P. LeGendre, Ruipeng Li, Pei-Hung Lin, Shelby Lockhart, Kathleen McCandless, Claudia Misale, Jaime H. Moreno, Rob Neely, Jarom Nelson, Rao Nimmakayala, Kathryn M. O'Brien, Kevin O'Brien, Ramesh Pankajakshan, Roger Pearce, Slaven Peles, Phil Regier, Steve Rennich, Martin Schulz 0001, Howard Scott, James C. Sexton, Kathleen Shoga, Shiv Sundram, Guillaume Thomas-Collignon, Brian Van Essen, Alexey Voronin, Bob Walkup, Lu Wang, Chris Ward, Hui-Fang Wen, Daniel A. White, Christopher Young, Cyril Zeller, Edward Zywicz. [doi]
- Moment representation in the lattice Boltzmann method on massively parallel hardwareMadhurima Vardhan, John Gounley, Luiz Hegele, Erik W. Draeger, Amanda Randles. [doi]
- PruneTrain: fast neural network training by dynamic sparse model reconfigurationSangkug Lym, Esha Choukse, Siavash Zangeneh, Wei Wen, Sujay Sanghavi, Mattan Erez. [doi]
- Distributed enhanced suffix arrays: efficient algorithms for construction and queryingPatrick Flick, Srinivas Aluru. [doi]
- HyperX topology: first at-scale implementation and comparison to the fat-treeJens Domke, Satoshi Matsuoka, Ivan R. Ivanov, Yuki Tsushima, Tomoya Yuki, Akihiro Nomura 0002, Shin'ichi Miura, Nie McDonald, Dennis L. Floyd, Nicolas Dubé. [doi]
- Revisiting I/O behavior in large-scale storage systems: the expected and the unexpectedTirthak Patel, Suren Byna, Glenn K. Lockwood, Devesh Tiwari. [doi]
- ab initio calculations using mixed precision computing: 46 PFLOPS simulation of a metallic dislocation systemSambit Das, Phani Motamarri, Vikram Gavini, Bruno Turcksin, Ying Wai Li, Brent Leback. [doi]
- LPCC: hierarchical persistent client caching for lustreYingjin Qian, Xi Li, Shuichi Ihara, Andreas Dilger, Carlos Thomaz, Shilong Wang, Wen Cheng, Chunyan Li, Lingfang Zeng, Fang Wang 0001, Dan Feng 0001, Tim Süß, André Brinkmann. [doi]
- Almost deterministic work stealingShumpei Shiina, Kenjiro Taura. [doi]
- Local-global merge tree computation with local exchangesArnur Nigmetov, Dmitriy Morozov. [doi]
- Understanding priority-based scheduling of graph algorithms on a shared-memory platformSerif Yesil, Azin Heidarshenas, Adam Morrison 0001, Josep Torrellas. [doi]
- Network-accelerated non-contiguous memory transfersSalvatore Di Girolamo, Konstantin Taranov, Andreas Kurth, Michael Schaffner, Timo Schneider, Jakub Beránek, Maciej Besta, Luca Benini, Duncan Roweth, Torsten Hoefler. [doi]
- Large-batch training for LSTM and beyondYang You, Jonathan Hseu, Chris Ying, James Demmel, Kurt Keutzer, Cho-Jui Hsieh. [doi]
- Fully integrated FPGA molecular dynamics simulationsChen Yang, Tong Geng, Tianqi Wang, Rushi Patel, Qingqing Xiong, Ahmed Sanaullah, Chunshu Wu, Jiayi Sheng, Charles Lin, Vipin Sachdeva, Woody Sherman, Martin C. Herbordt. [doi]
- An early evaluation of Intel's optane DC persistent memory module and its impact on high-performance scientific applicationsMichèle Weiland, Holger Brunst, Tiago Quintino, Nick Johnson, Olivier Iffrig, Simon D. Smart, Christian Herold, Antonino Bonanni, Adrian Jackson, Mark Parsons. [doi]
- Addressing data resiliency for staging based scientific workflowsShaohua Duan, Pradeep Subedi, Philip E. Davis, Manish Parashar. [doi]
- Near-memory data transformation for efficient sparse matrix multi-vector multiplicationDaichi Fujiki, Niladrish Chatterjee, Donghyuk Lee, Mike O'Connor. [doi]
- Hatchet: pruning the overgrowth in parallel profilesAbhinav Bhatele, Stephanie Brink, Todd Gamblin. [doi]
- Predicting faults in high performance computing systems: an in-depth survey of the state-of-the-practiceDavid Jauk, Dai Yang, Martin Schulz 0001. [doi]
- Replication is more efficient than you thinkAnne Benoit, Thomas Hérault, Valentin Le Fèvre, Yves Robert. [doi]
- Conflict-free symmetric sparse matrix-vector multiplication on multicore architecturesAthena Elafrou, Georgios I. Goumas, Nectarios Koziris. [doi]
- SLATE: design of a modern distributed and accelerated linear algebra libraryMark Gates, Jakub Kurzak, Ali Charara 0001, Asim YarKhan, Jack J. Dongarra. [doi]
- GraphM: an efficient storage system for high throughput of concurrent graph processingJin Zhao, Yu Zhang, Xiaofei Liao, Ligang He, Bingsheng He, Hai Jin 0001, Haikun Liu, Yicheng Chen. [doi]
- From facility to application sensor data: modular, continuous and holistic monitoring with DCDBAlessio Netti, Micha Müller, Axel Auweter, Carla Guillen, Michael Ott, Daniele Tafani, Martin Schulz 0001. [doi]
- Optimizing the data movement in quantum transport simulations via data-centric parallel programmingAlexandros Nikolaos Ziogas, Tal Ben-Nun, Guillermo Indalecio Fernández, Timo Schneider, Mathieu Luisier, Torsten Hoefler. [doi]
- Full-state quantum circuit simulation by using data compressionXin-Chuan Wu, Sheng Di, Emma Maitreyee Dasgupta, Franck Cappello, Hal Finkel, Yuri Alexeev, Frederic T. Chong. [doi]
- End-to-end I/O portfolio for the summit supercomputing ecosystemSarp Oral, Sudharshan S. Vazhkudai, Feiyi Wang, Christopher Zimmer, Christopher Brumgard, Jesse Hanley, George Markomanolis, Ross Miller, Dustin Leverman, Scott Atchley, Verónica G. Vergara Larrea. [doi]
- Etalumis: bringing probabilistic programming to scientific simulators at scaleAtilim Günes Baydin, Lei Shao, Wahid Bhimji, Lukas Heinrich, Lawrence Meadows, Jialin Liu 0002, Andreas Munk, Saeid Naderiparizi, Bradley Gram-Hansen, Gilles Louppe, Mingfei Ma, Xiaohui Zhao, Philip H. S. Torr, Victor W. Lee, Kyle Cranmer, Prabhat, Frank Wood. [doi]
- AutoFFT: a template-based FFT codes auto-generation framework for ARM and X86 CPUsZhihao Li, Haipeng Jia, Yunquan Zhang, Tun Chen, Liang Yuan, Luning Cao, Xiao Wang. [doi]
- A large-scale study of MPI usage in open-source HPC applicationsIgnacio Laguna, Ryan Marshall, Kathryn Mohror, Martin Ruefenacht, Anthony Skjellum, Nawrin Sultana. [doi]
- Semantic query transformations for increased parallelization in distributed knowledge graph query processingHyeongSik Kim, Abhisha Bhattacharyya, Kemafor Anyanwu. [doi]
- Performance optimality or reproducibility: that is the questionTapasya Patki, Jayaraman J. Thiagarajan, Alexis Ayala, Tanzima Zerin Islam. [doi]
- CARE: compiler-assisted recovery from soft failuresChao Chen, Greg Eisenhauer, Santosh Pande, Qiang Guan. [doi]
- Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUsTuowen Zhao, Protonu Basu, Samuel Williams, Mary W. Hall, Hans Johansen. [doi]
- An efficient mixed-mode representation of sparse tensorsIsrat Nisa, Jiajia Li 0001, Aravind Sukumaran-Rajam, Prashant Singh Rawat, Sriram Krishnamoorthy, P. Sadayappan. [doi]
- SSD failures in the field: symptoms, causes, and prediction modelsJacob Alter, Ji Xue, Alma Dimnaku, Evgenia Smirni. [doi]
- Solving PDEs in space-time: 4D tree-based adaptivity, mesh-free and matrix-free approachesMasado Ishii, Milinda Fernando, Kumar Saurabh, Biswajit Khara, Baskar Ganapathysubramanian, Hari Sundar. [doi]
- Understanding congestion in high performance interconnection networks using samplingPhilip Taffet, John M. Mellor-Crummey. [doi]
- Code generation for massively parallel phase-field simulationsMartin Bauer 0003, Johannes Hötzer, Dominik Ernst, Julian Hammer, Marco Seiz, Henrik Hierl, Jan Hönig, Harald Köstler, Gerhard Wellein, Britta Nestler, Ulrich Rüde. [doi]
- Uncore power scavenger: a runtime for uncore power conservation on HPC systemsNeha Gholkar, Frank Mueller, Barry Rountree. [doi]
- Parallel transport time-dependent density functional theory calculations with hybrid functional on summitWeile Jia, Lin-Wang Wang, Lin Lin 0001. [doi]
- ComDetective: a lightweight communication detection tool for threadsMuhammad Aditya Sasongko, Milind Chabbi, Palwisha Akhtar, Didem Unat. [doi]
- BinFI: an efficient fault injector for safety-critical machine learning systemsZitao Chen, Guanpeng Li, Karthik Pattabiraman, Nathan DeBardeleben. [doi]
- Adaptive neural network-based approximation to accelerate eulerian fluid simulationWenqian Dong, Jie Liu, Zhen Xie, Dong Li. [doi]
- INCA: in-network compute assistanceWhit Schonbein, Ryan E. Grant, Matthew G. F. Dosanjh, Dorian C. Arnold. [doi]
- SW_GROMACS: accelerate GROMACS on Sunway TaihuLightTingjian Zhang, Yuxuan Li, Ping Gao, Qi Shao, Mingshan Shao, Meng Zhang, Jinxiao Zhang, Xiaohui Duan, Zhao Liu, Lin Gan, Haohuan Fu, Wei Xue, Weiguo Liu, Guangwen Yang. [doi]
- MemXCT: memory-centric X-ray CT reconstruction with massive parallelizationMert Hidayetoglu, Tekin Biçer, Simon Garcia De Gonzalo, Bin Ren, Doga Gürsoy, Rajkumar Kettimuthu, Ian T. Foster, Wen-mei W. Hwu. [doi]
- Diogenes: looking for an honest CPU/GPU performance measurement toolBenjamin Welton, Barton P. Miller. [doi]
- TriEC: tripartite graph based erasure coding NIC offloadHaiyang Shi, Xiaoyi Lu. [doi]
- OpenKMC: a KMC design for hundred-billion-atom simulation using millions of cores on Sunway TaihulightKun Li, Honghui Shang, Yunquan Zhang, Shigang Li 0002, Baodong Wu, Dong Wang, Libo Zhang, Fang Li, Dexun Chen, Zhiqiang Wei. [doi]
- Scalable reinforcement-learning-based neural architecture search for cancer deep learning researchPrasanna Balaprakash, Romain Egele, Misha Salim, Stefan M. Wild, Venkatram Vishwanath, Fangfang Xia, Tom Brettin, Rick Stevens. [doi]
- BSTC: a novel binarized-soft-tensor-core design for accelerating bit-based approximated neural netsAng Li, Tong Geng, Tianqi Wang, Martin C. Herbordt, Shuaiwen Leon Song, Kevin J. Barker. [doi]
- Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplicationGrzegorz Kwasniewski, Marko Kabic, Maciej Besta, Joost VandeVondele, Raffaele Solcà, Torsten Hoefler. [doi]
- A constraint-based approach to automatic data partitioning for distributed memory executionWonchan Lee, Manolis Papadakis, Elliott Slaughter, Alex Aiken. [doi]
- Scalable simulation of realistic volume fraction red blood cell flows through vascular networksLibin Lu, Matthew J. Morse, Abtin Rahimian, Georg Stadler, Denis Zorin. [doi]
- ab initio dissipative quantum transport simulationsAlexandros Nikolaos Ziogas, Tal Ben-Nun, Guillermo Indalecio Fernández, Timo Schneider, Mathieu Luisier, Torsten Hoefler. [doi]
- Swift machine learning model serving scheduling: a region based reinforcement learning approachHeyang Qin, Syed Zawad, Yanqi Zhou, Lei Yang, Dongfang Zhao, Feng Yan 0001. [doi]
- Channel and filter parallelism for large-scale CNN trainingNikoli Dryden, Naoya Maruyama, Tim Moon, Tom Benson, Marc Snir, Brian Van Essen. [doi]
- Practical and efficient incremental adaptive routing for HyperX networksNic McDonald, Mikhail Isaev, Adriana Flores, Al Davis, John Kim. [doi]
- Slim graph: practical lossy graph compression for approximate graph processing, storage, and analyticsMaciej Besta, Simon Weber, Lukas Gianinazzi, Robert Gerstenberger, Andrey Ivanov, Yishai Oltchik, Torsten Hoefler. [doi]
- From piz daint to the stars: simulation of stellar mergers using high-level abstractionsGregor Daiß, Parsa Amini, John Biddiscombe, Patrick Diehl, Juhan Frank, Kevin A. Huck, Hartmut Kaiser, Dominic Marcello, David Pfander, Dirk Pflüger. [doi]
- SparCML: high-performance sparse communication for machine learningCédric Renggli, Saleh Ashkboos, Mehdi Aghagolzadeh, Dan Alistarh, Torsten Hoefler. [doi]
- Assessing the impact of timing errors on HPC applicationsChun-Kai Chang, Wenqi Yin, Mattan Erez. [doi]
- Stateful dataflow multigraphs: a data-centric model for performance portability on heterogeneous architecturesTal Ben-Nun, Johannes de Fine Licht, Alexandros Nikolaos Ziogas, Timo Schneider, Torsten Hoefler. [doi]
- FT-iSort: efficient fault tolerance for introsortSihuan Li, Hongbo Li, Xin Liang, Jieyang Chen, Elisabeth Giem, Kaiming Ouyang, Kai Zhao, Sheng Di, Franck Cappello, Zizhong Chen. [doi]
- Scalable generation of graphs for benchmarking HPC community-detection algorithmsGeorge M. Slota, Jonathan W. Berry, Simon D. Hammond, Stephen L. Olivier, Cynthia A. Phillips, Sivasankaran Rajamanickam. [doi]
- D2P: from recursive formulations to distributed-memory codesNikhil Hegde, Qifan Chang, Milind Kulkarni 0001. [doi]
- Topology-custom UGAL routing on dragonflyMd Shafayat Rahman, Saptarshi Bhowmik, Yevgeniy Ryasnianskiy, Xin Yuan, Michael Lang. [doi]
- An evaluation of the CORAL interconnectsChristopher J. Zimmer, Scott Atchley, Ramesh Pankajakshan, Brian E. Smith, Ian Karlin, Matthew L. Leininger, Adam Bertsch, Brian S. Ryujin, Jason Burmark, André Walker-Loud, Michael A. Clark, Olga Pearce. [doi]
- High performance Monte Carlo simulation of ising model on TPU clustersKun Yang, Yi-Fan Chen, Georgios Roumpos, Chris Colby, John R. Anderson. [doi]
- GPU acceleration of extreme scale pseudo-spectral simulations of turbulence using asynchronismKiran Ravikumar, David Appelhans, P.-K. Yeung. [doi]
- Significantly improving lossy compression quality based on an optimized hybrid prediction modelXin Liang, Sheng Di, Sihuan Li, Dingwen Tao, Bogdan Nicolae, Zizhong Chen, Franck Cappello. [doi]
- Mitigating network noise on Dragonfly networks through application-aware routingDaniele De Sensi, Salvatore Di Girolamo, Torsten Hoefler. [doi]
- Analytical cache modeling and tilesize optimization for tensor contractionsRui Li, Aravind Sukumaran-Rajam, Richard Veras, Tze Meng Low, Fabrice Rastello, Atanas Rountev, P. Sadayappan. [doi]
- Spread-n-share: improving application performance and cluster throughput with resource-aware job placementXiongchao Tang, Haojie Wang, Xiaosong Ma, Nosayba El-Sayed, Jidong Zhai, Wenguang Chen, Ashraf Aboulnaga. [doi]
- Slack squeeze coded computing for adaptive straggler mitigationKrishna Giri Narra, Zhifeng Lin, Mehrdad Kiamari, Salman Avestimehr, Murali Annavaram. [doi]