Abstract is missing.
- A Tale of Two C's: Convergence and ComposabilityIlkay Altintas. 1 [doi]
- Correlation-wise Smoothing: Lightweight Knowledge Extraction for HPC Monitoring DataAlessio Netti, Daniele Tafani, Michael Ott, Martin Schulz 0001. 2-12 [doi]
- Dancing in the Dark: Profiling for Tiered MemoryJinyoung Choi, Sergey Blagodurov, Hung-Wei Tseng 0001. 13-22 [doi]
- Noise-Resilient Empirical Performance Modeling with Deep Neural NetworksMarcus Ritter, Alexander Geiß, Johannes Wehrstein, Alexandru Calotoiu, Thorsten Reimann, Torsten Hoefler, Felix Wolf 0001. 23-34 [doi]
- SYMBIOSYS: A Methodology for Performance Analysis of Composable HPC Data ServicesSrinivasan Ramesh, Allen D. Malony, Philip H. Carns, Robert B. Ross, Matthieu Dorier, Jérome Soumagne, Shane Snyder. 35-45 [doi]
- Accelerating Distributed-Memory Autotuning via Statistical Analysis of Execution PathsEdward Hutter, Edgar Solomonik. 46-57 [doi]
- Optimizing Memory-Compute Colocation for Irregular Applications on a Migratory Thread ArchitectureThomas B. Rolinger, Christopher D. Krieger, Alan Sussman. 58-67 [doi]
- TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUsYuyao Niu, Zhengyang Lu, Meichen Dong, Zhou Jin, Weifeng Liu 0001, Guangming Tan. 68-78 [doi]
- Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix ProblemsQinglei Cao, Yu Pei, Kadir Akbudak, George Bosilca, Hatem Ltaief, David E. Keyes, Jack J. Dongarra. 79-89 [doi]
- Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme ScaleMd Taufique Hussain, Oguz Selvitopi, Aydin Buluç, Ariful Azad. 90-100 [doi]
- Characterizing Small-Scale Matrix Multiplications on ARMv8-based Many-Core ArchitecturesWeiling Yang, Jianbin Fang, Dezun Dong. 101-110 [doi]
- DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU RuntimeAlberto Parravicini, Arnaud Delamare, Marco Arnaboldi, Marco D. Santambrogio. 111-120 [doi]
- CTXBack: Enabling Low Latency GPU Context Switching via Context FlashbackZhuoran Ji, Cho-Li Wang. 121-130 [doi]
- Transparent I/O-Aware GPU Virtualization for Efficient Resource ConsolidationNelson Mimura Gonzalez, Tonia Elengikal. 131-140 [doi]
- Demystifying GPU UVM Cost with Deep Runtime and Workload AnalysisTyler Allen, Rong Ge 0002. 141-150 [doi]
- DUET: A Compiler-Runtime Subgraph Scheduling Approach for Tensor Programs on a Coupled CPU-GPU ArchitectureMinjia Zhang, Zehua Hu, Mingqin Li. 151-161 [doi]
- CAGC: A Content-aware Garbage Collection Scheme for Ultra-Low Latency Flash-based SSDsSuzhen Wu, Chunfeng Du, Haijun Li, Hong Jiang, Zhirong Shen, Bo Mao. 162-171 [doi]
- NVMe-CR: A Scalable Ephemeral Storage Runtime for Checkpoint/Restart with NVMe-over-FabricsShashank Gugnani, Tianxi Li, Xiaoyi Lu. 172-181 [doi]
- Virtual-Link: A Scalable Multi-Producer Multi-Consumer Message Queue Architecture for Cross-Core CommunicationQinzhe Wu, Jonathan Beard, Ashen Ekanayake, Andreas Gerstlauer, Lizy K. John. 182-191 [doi]
- High-Level Synthesis of Parallel Specifications Coupling Static and Dynamic ControllersVito Giovanni Castellana, Antonino Tumeo, Fabrizio Ferrandi. 192-202 [doi]
- RVMA: Remote Virtual Memory AccessRyan E. Grant, Michael J. Levenhagen, Matthew G. F. Dosanjh, Patrick M. Widener. 203-212 [doi]
- Performance-Portable Graph Coarsening for Efficient Multilevel Graph AnalysisMichael S. Gilbert, Seher Acer, Erik G. Boman, Kamesh Madduri, Sivasankaran Rajamanickam. 213-222 [doi]
- Efficient Distributed Algorithms in the k-machine model via PRAM SimulationsJohn Augustine, Kishore Kothapalli, Gopal Pandurangan. 223-232 [doi]
- Euler Meets GPU: Practical Graph Algorithms with Theoretical GuaranteesAdam Polak 0001, Adrian Siwiec, Michal Stobierski. 233-244 [doi]
- MultiLogVC: Efficient Out-of-Core Graph Processing Framework for Flash StorageKiran Kumar Matam, Hanieh Hashemi, Murali Annavaram. 245-255 [doi]
- FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural NetworksMd. Khaledur Rahman, Majedul Haque Sujon, Ariful Azad. 256-266 [doi]
- Systemic Assessment of Node Failures in HPC Production PlatformsAnwesha Das, Frank Mueller 0001, Barry Rountree. 267-276 [doi]
- Combining XOR and Partner Checkpointing for Resilient Multilevel Checkpoint/RestartMasoud Gholami, Florian Schintke. 277-288 [doi]
- Demystifying GPU Reliability: Comparing and Combining Beam Experiments, Fault Simulation, and ProfilingFernando Fernandes dos Santos, Siva Kumar Sastry Hari, Pedro Martins Basso, Luigi Carro, Paolo Rech. 289-298 [doi]
- Improving checkpointing intervals by considering individual job failure probabilitiesAlvaro Frank, Manuel Baumgartner, Reza Salkhordeh, André Brinkmann. 299-309 [doi]
- Covirt: Lightweight Fault Isolation and Resource Protection for Co-KernelsNicholas Gordon, John R. Lange. 310-319 [doi]
- Introducing Application Awareness Into a Unified Power Management StackDaniel C. Wilson, Siddhartha Jana, Aniruddha Marathe, Stephanie Brink, Christopher M. Cantalupo, Diana R. Guttman, Brad Geltz, Lowren H. Lawson, Asma H. Al-rawi, Ali Mohammad, Fuat Keceli, Federico Ardanaz, Jonathan M. Eastep, Ayse K. Coskun. 320-329 [doi]
- PALM: Progress- and Locality-Aware Adaptive Task Migration for Efficient Thread PackingJinsu Park, Seongbeom Park, Myeonggyun Han, Woongki Baek. 330-339 [doi]
- Performance Evaluation of Adaptive Routing on Dragonfly-based Production SystemsSudheer Chunduri, Kevin Harms, Taylor Groves, Peter Mendygral, Justs Zarins, Michèle Weiland, Yasaman Ghadar. 340-349 [doi]
- Cori: Dancing to the Right Beat of Periodic Data Movements over Hybrid Memory SystemsThaleia Dimitra Doudali, Daniel Zahka, Ada Gavrilovska. 350-359 [doi]
- Nowa: A Wait-Free Continuation-Stealing Concurrency PlatformFlorian Schmaus, Nicolas Pfeiffer, Wolfgang Schröder-Preikschat, Timo Hönig, Jörg Nolte. 360-371 [doi]
- Efficient Algorithms for Encrypted All-gather OperationMehran Sadeghi Lahijani, Abu-Naser, Cong Wu, Mohsen Gavahi, Viet Tung Hoang, Zhi Wang 0004, Xin Yuan 0001. 372-381 [doi]
- CBNet: Minimizing Adjustments in Concurrent Demand-Aware Tree NetworksOtavio Augusto de Oliveira Souza, Olga Goussevskaia, Stefan Schmid 0001. 382-391 [doi]
- Scaling Sparse Matrix Multiplication on CPU-GPU NodesYang Xia, Peng Jiang 0004, Gagan Agrawal, Rajiv Ramnath. 392-401 [doi]
- zMesh: Exploring Application Characteristics to Improve Lossy Compression Ratio for Adaptive Mesh RefinementHuizhang Luo, Junqi Wang, Qing Liu 0002, Jieyang Chen, Scott Klasky, Norbert Podhorszki. 402-411 [doi]
- Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension treeLinjian Ma, Edgar Solomonik. 412-421 [doi]
- 12 Ways to Fool the Masses with Irreproducible ResultsLorena A. Barba. 422 [doi]
- Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and Stable ConvergenceKarl Bäckström, Ivan Walulya, Marina Papatriantafilou, Philippas Tsigas. 423-432 [doi]
- Redesigning Peridigm on SIMT Accelerators for High-performance Peridynamics SimulationsXinyuan Li, Huang Ye, Jian Zhang. 433-443 [doi]
- *Q. Zhou, C. Chu, N. S. Kumar, Pouya Kousha, Seyedeh Mahdieh Ghazimirsaeed, Hari Subramoni, Dhabaleswar K. Panda 0001. 444-453 [doi]
- xBGAS: A Global Address Space Extension on RISC-V for High Performance ComputingXi Wang 0009, John D. Leidel, Brody Williams, Alan Ehret, Miguel Mark, Michel A. Kinsy, Yong Chen. 454-463 [doi]
- ARBALEST: Dynamic Detection of Data Mapping Issues in Heterogeneous OpenMP ApplicationsLechen Yu, Joachim Protze, Oscar R. Hernandez, Vivek Sarkar. 464-474 [doi]
- Spray: Sparse Reductions of Arrays in OPENMPJan Hückelheim, Johannes Doerfert. 475-484 [doi]
- Code Generation for Room Acoustics Simulations with Complex Boundary ConditionsLarisa Stoltzfus, Brian Hamilton, Michel Steuwer, Lu Li, Christophe Dubach. 485-496 [doi]
- Temporal blocking of finite-difference stencil operators with sparse "off-the-grid" sourcesGeorge Bisbas, Fabio Luporini, Mathias Louboutin, Rhodri Nelson, Gerard J. Gorman, Paul H. J. Kelly. 497-506 [doi]
- Accelerating non-power-of-2 size Fourier transforms with GPU Tensor CoresLouis Pisha, Lukasz Ligowski. 507-516 [doi]
- Parallel String Graph Construction and Transitive Reduction for De Novo Genome AssemblyGiulia Guidi, Oguz Selvitopi, Marquita Ellis, Leonid Oliker, Katherine A. Yelick, Aydin Buluç. 517-526 [doi]
- Distributed-Memory k-mer Counting on GPUsIsrat Nisa, Prashant Pandey, Marquita Ellis, Leonid Oliker, Aydin Buluç, Katherine A. Yelick. 527-536 [doi]
- Distributed-memory multi-GPU block-sparse tensor contraction for electronic structureThomas Hérault, Yves Robert, George Bosilca, Robert J. Harrison, Cannada A. Lewis, Edward F. Valeev, Jack J. Dongarra. 537-546 [doi]
- Adaptive Spatially Aware I/O for Multiresolution Particle Data LayoutsWill Usher 0001, Xuan Huang, Steve Petruzza, Sidharth Kumar, Stuart R. Slattery, Sam T. Reeve, Feng Wang, Chris R. Johnson, Valerio Pascucci. 547-556 [doi]
- Interpreting Write Performance of Supercomputer I/O Systems with Regression ModelsBing Xie, Zilong Tan, Philip H. Carns, Jeffrey S. Chase, Kevin Harms, Jay F. Lofstead, Sarp Oral, Sudharshan S. Vazhkudai, Feiyi Wang. 557-566 [doi]
- Finer-LRU: A Scalable Page Management Scheme for HPC Manycore ArchitecturesJiwoo Bang, Chungyong Kim, Sunggon Kim, Qichen Chen, Cheongjun Lee, Eun-Kyu Byun, Jaehwan Lee 0001, Hyeonsang Eom. 567-576 [doi]
- Arbitration Policies for On-Demand User-Level I/O Forwarding on HPC PlatformsJean Luca Bez, Alberto Miranda, Ramon Nou, Francieli Zanon Boito, Toni Cortes, Philippe O. A. Navaux. 577-586 [doi]
- A Hybrid Scheduling Scheme for Parallel LoopsAaron Handleman, Arthur G. Rattew, I-Ting Angelina Lee, Tao B. Schardl. 587-598 [doi]
- EAGLE: Expedited Device Placement with Automatic Grouping for Large ModelsHao Lan, Li Chen, Baochun Li. 599-608 [doi]
- BiPS: Hotness-aware Bi-tier Parameter Synchronization for Recommendation ModelsQiming Zheng, Quan Chen, Kaihao Bai, Huifeng Guo, Yong Gao, Xiuqiang He, Minyi Guo. 609-618 [doi]
- DSXplore: Optimizing Convolutional Neural Networks via Sliding-Channel ConvolutionsYuke Wang, Boyuan Feng, Yufei Ding. 619-628 [doi]
- SUPER: SUb-Graph Parallelism for TransformERsArpan Jain, Tim Moon, Tom Benson, Hari Subramoni, Sam Adé Jacobs, Dhabaleswar K. Panda 0001, Brian Van Essen. 629-638 [doi]
- Scalable Epidemiological Workflows to Support COVID-19 Planning and ResponseDustin Machi, Parantapa Bhattacharya, Stefan Hoops, Jiangzhuo Chen, Henning S. Mortveit, Srinivasan Venkatramanan, Bryan L. Lewis, Mandy L. Wilson, Arindam Fadikar, Tom Maiden, Christopher L. Barrett, Madhav V. Marathe. 639-650 [doi]
- Facilitating Data Discovery for Large-scale Science Facilities using Knowledge NetworksYubo Qin, Ivan Rodero, Manish Parashar. 651-660 [doi]
- Optimal Task Assignment for Heterogeneous Federated Learning DevicesLaércio Lima Pilla. 661-670 [doi]
- Detecting Malicious Model Updates from Federated Learning on Conditional Variational AutoencoderZhipin Gu, Yuexiang Yang. 671-680 [doi]
- Is Asymptotic Cost Analysis Useful in Developing Practical Parallel AlgorithmsGuy E. Blelloch. 681 [doi]
- From Parallelization to Customization - Challenges and OpportunitiesJason Cong. 682 [doi]
- High Performance Streaming Tensor DecompositionYongseok Soh, Patrick Flick, Xing Liu, Shaden Smith, Fabio Checconi, Fabrizio Petrini, Jee W. Choi. 683-692 [doi]
- Plex: Scaling Parallel Lexing with Backtrack-Free PrescanningLe Li, Shigeyuki Sato, Qiheng Liu, Kenjiro Taura. 693-702 [doi]
- Speculative Parallel Reverse Cuthill-McKee Reordering on Multi- and Many-core ArchitecturesDaniel Mlakar, Martin Winter, Mathias Parger, Markus Steinberger. 703-713 [doi]
- Jigsaw: A Slice-and-Dice Approach to Non-uniform FFT Acceleration for MRI Image ReconstructionBrendan L. West, Jeffrey A. Fessler, Thomas F. Wenisch. 714-723 [doi]
- Rank Position Forecasting in Car RacingBo Peng, Jiayu Li, Selahattin Akkas, Takuya Araki, Ohno Yoshiyuki, Judy Qiu. 724-733 [doi]
- Towards Practical Cloud Offloading for Low-cost Ground Vehicle WorkloadsYuan Xu, Tianwei Zhang 0004, Jimin Han, Sa Wang, Yungang Bao. 734-745 [doi]
- Towards Internet-Scale Convolutional Root-Cause Analysis with DIAGNETLoïck Bonniot, Christoph Neumann, François Taïani. 746-755 [doi]
- Astra: Autonomous Serverless Analytics with Cost-Efficiency and QoS-AwarenessJananie Jarachanthan, Li Chen, Fei Xu, Bo Li 0001. 756-765 [doi]
- Max-Stretch Minimization on an Edge-Cloud PlatformAnne Benoit, Redouane Elghazi, Yves Robert. 766-775 [doi]
- Decentralized Low-Latency Task Scheduling for Ad-Hoc ComputingJanick Edinger, Mamn Breitbach, Niklas Gabrisch, Dominik Schäfer, Christian Becker 0001, Amr Rizk. 776-785 [doi]
- Lightweight Function Monitors for Fine-Grained Management in Large Scale Python ApplicationsTim Shaffer, Zhuozhao Li, Ben Tovar, Yadu N. Babuji, T. J. Dasso, Zoe Surma, Kyle Chard, Ian T. Foster, Douglas Thain. 786-796 [doi]
- AlphaR: Learning-Powered Resource Management for Irregular, Dynamic Microservice GraphXiaofeng Hou, Chao Li, Jiacheng Liu 0001, Lu Zhang, Shaolei Ren, Jingwen Leng, Quan Chen, Minyi Guo. 797-806 [doi]
- Deep Reinforcement Agent for Scheduling in HPCYuping Fan, Zhiling Lan, J. Taylor Childers, Paul Rich, William E. Allcock, Michael E. Papka. 807-816 [doi]
- F-Write: Fast RDMA-supported Writes in Erasure-coded In-memory ClustersBin Xu, Jianzhong Huang 0001, Qiang Cao, Xiao Qin 0001, Ping Xie. 817-826 [doi]
- Argus: Efficient Job Scheduling in RDMA-assisted Big Data ProcessingSijie Wu, Hanhua Chen, Yonghui Wang, Hai Jin 0001. 827-836 [doi]
- Scaling Out a Combinatorial Algorithm for Discovering Carcinogenic Gene Combinations to Thousands of GPUsSajal Dash, Qais Al-Hajri, Wu-chun Feng, Harold R. Garner, Ramu Anandakrishnan. 837-846 [doi]
- A Multi-GPU Design for Large Size Cryo-EM 3D ReconstructionZihao Wang, Xiaohua Wan, Zhiyong Liu 0002, Qianshuo Fan, Fa Zhang 0001, Guangming Tan. 847-858 [doi]
- Accelerating Multigrid-based Hierarchical Scientific Data Refactoring on GPUsJieyang Chen, Lipeng Wan, Xin Liang 0001, Ben Whitney, Qing Liu 0002, David Pugmire, Nicholas Thompson, Jong Youl Choi, Matthew Wolf, Todd Munson, Ian T. Foster, Scott Klasky. 859-868 [doi]
- Extremely Fast and Energy Efficient One-way Wave Equation Migration on GPU-based heterogeneous architectureLong Qu, Loris Lucido, Marie Bonnasse-Gahot, Pascal Vezolle, Diego Klahr. 869-880 [doi]
- Revisiting Huffman Coding: Toward Extreme Performance on Modern GPU ArchitecturesJiannan Tian, Cody Rivera, Sheng Di, Jieyang Chen, Xin Liang, Dingwen Tao, Franck Cappello. 881-891 [doi]
- Rack-Scaling: An efficient rack-based redistribution method to accelerate the scaling of cloud disk arraysZhehan Lin, Hanchen Guo, Chentao Wu, Jie Li, Guangtao Xue, Minyi Guo. 892-901 [doi]
- Optimizing Performance for Open-Channel SSDs in Cloud Storage SystemXiaoyi Zhang, Feng Zhu, Shu Li, Kun Wang, Wei Xu, Dengcai Xu. 902-911 [doi]
- AuTraScale: An Automated and Transfer Learning Solution for Streaming System Auto-ScalingLiang Zhang, Wenli Zheng, Chao Li, Yao Shen, Minyi Guo. 912-921 [doi]
- SNOW Revisited: Understanding When Ideal READ Transactions Are PossibleKishori M. Konwar, Wyatt Lloyd, Haonan Lu, Nancy A. Lynch. 922-931 [doi]
- QoS-Aware and Resource Efficient Microservice Deployment in Cloud-Edge ContinuumKaihua Fu, Wei Zhang, Quan Chen, Deze Zeng, Xin Peng, Wenli Zheng, Minyi Guo. 932-941 [doi]
- Byzantine Dispersion on GraphsAnisur Rahaman Molla, Kaushik Mondal 0001, William K. Moses Jr.. 942-951 [doi]
- Byzantine Agreement with Unknown Participants and FailuresPankaj Khanchandani, Roger Wattenhofer. 952-961 [doi]
- QPR: Quantizing PageRank with Coherent Shared Memory AcceleratorsAbdullah T. Mughrabi, Mohannad Ibrahim, Gregory T. Byrd. 962-972 [doi]
- Distributed Training of Embeddings using Graph AnalyticsGurbinder Gill, Roshan Dathathri, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi. 973-983 [doi]
- Multiplicative Weights Algorithms for Parallel Automated Software RepairJoseph Renzullo, Westley Weimer, Stephanie Forrest. 984-993 [doi]
- An In-Depth Analysis of Distributed Training of Deep Neural NetworksYun-Yong Ko, Kibong Choi, Jiwon Seo, Sang-Wook Kim. 994-1003 [doi]
- Automatic Graph Partitioning for Very Large-scale Deep LearningMasahiro Tanaka, Kenjiro Taura, Toshihiro Hanawa, Kentaro Torisawa. 1004-1013 [doi]
- Extending Sparse Tensor Accelerators to Support Multiple Compression FormatsEric Qin 0001, Geonhwa Jeong, William Won, Sheng-Chun Kao, Hyoukjun Kwon, Sudarshan Srinivasan, Dipankar Das 0002, Gordon E. Moon, Sivasankaran Rajamanickam, Tushar Krishna. 1014-1024 [doi]
- Pase: Parallelization Strategies for Efficient DNN TrainingVenmugil Elango. 1025-1034 [doi]
- Efficient Video Captioning on Heterogeneous System ArchitecturesHorng-Ruey Huang, Ding-Yong Hong, Jan-Jan Wu, Pangfeng Liu, Wei-Chung Hsu. 1035-1045 [doi]
- SRNoC: A Statically-Scheduled Circuit-Switched Superconducting Race Logic NoCGeorge Michelogiannakis, Darren Lyles, Patricia Gonzalez-Guerrero, Meriam Bautista, Dilip Vasudevan, Anastasiia Butko. 1046-1055 [doi]
- Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?Jens Domke, Emil Vatai, Aleksandr Drozd, Peng ChenT, Yosuke Oyama, Lingqi Zhang 0001, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka. 1056-1065 [doi]
- Performance Analysis of Scientific Computing Workloads on General Purpose TEEsAyaz Akram, Anna Giannakou, Venkatesh Akella, Jason Lowe-Power, Sean Peisert. 1066-1076 [doi]
- High-Performance Spectral Element Methods on Field-Programmable Gate Arrays : Implementation, Evaluation, and Future ProjectionMartin Karp, Artur Podobas, Niclas Jansson, Tobias Kenter, Christian Plessl, Philipp Schlatter, Stefano Markidis. 1077-1086 [doi]
- High-Level FPGA Accelerator Design for Structured-Mesh-Based Explicit Numerical SolversKamalakkannan Kamalavasan, Gihan R. Mudalige, István Z. Reguly, Suhaib A. Fahmy. 1087-1096 [doi]