Abstract is missing.
- Adaptive Stochastic Gradient Descent for Deep Learning on Heterogeneous CPU+GPU ArchitecturesYujing Ma, Florin Rusu, Kesheng Wu, Alexander Sim. 6-15 [doi]
- Providing In-depth Performance Analysis for Heterogeneous Task-based Applications with StarVZVinícius Garcia Pinto, Lucas Leandro Nesi, Marcelo Cogo Miletto, Lucas Mello Schnorr. 16-25 [doi]
- A Streaming Accelerator for Heterogeneous CPU-FPGA Processing of Graph ApplicationsFrancis O'Brien, Matthew Agostini, Tarek S. Abdelrahman. 26-35 [doi]
- A New Double Rank-based Multi-workflow Scheduling with Multi-objective Optimization in Cloud EnvironmentsFeng Li, Moon Gi Seok, Wentong Cai 0001. 36-45 [doi]
- Pooling Acceleration in the DaVinci Architecture Using Im2col and Col2im InstructionsCaio S. Rohwedder, João P. L. de Carvalho, José Nelson Amaral, Guido Araújo, Giancarlo Colmenares, Kai-Ting Amy Wang. 46-55 [doi]
- Scheduling HPC Workflows with Intel Optane Persistent MemoryRanjan Sarpangala Venkatesh, Tony Mason, Pradeep Fernando, Greg Eisenhauer, Ada Gavrilovska. 56-65 [doi]
- Coding the Computing Continuum: Fluid Function Execution in Heterogeneous Computing EnvironmentsRohan Kumar, Matt Baughman, Ryan Chard, Zhuozhao Li, Yadu N. Babuji, Ian T. Foster, Kyle Chard. 66-75 [doi]
- Practice and Experience in using Parallel and Scalable Machine Learning with Heterogenous Modular Supercomputing ArchitecturesMorris Riedel, Rocco Sedona, Chadi Barakat, Petur Einarsson, Reza Hassanian, Gabriele Cavallaro, Matthias Book, Helmut Neukirchen, Andreas Lintermann. 76-85 [doi]
- Accelerating ODE-Based Neural Networks on Low-Cost FPGAsHirohisa Watanabe, Hiroki Matsutani. 88-95 [doi]
- An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential LearningHirohisa Watanabe, Mineto Tsukada, Hiroki Matsutani. 96-103 [doi]
- Plaster: an Embedded FPGA-based Cluster Orchestrator for Accelerated Distributed AlgorithmsLorenzo Farinelli, Daniele Valentino De Vincenti, Andrea Damiani, Luca Stornaiuolo, Rolando Brondolin, Marco D. Santambrogio, Donatella Sciuto. 104-107 [doi]
- BinaryCoP: Binary Neural Network-based COVID-19 Face-Mask Wear and Positioning Predictor on Edge DevicesNael Fasfous, Manoj Rohit Vemparala, Alexander Frickenstein, Lukas Frickenstein, Mohamed Badawy, Walter Stechele. 108-115 [doi]
- Exploring a Layer-based Pre-implemented Flow for Mapping CNN on FPGADanielle Tchuinkou Kwadjo, Joel Mandebi Mbongue, Christophe Bobda. 116-123 [doi]
- A Machine Learning Approach to Predict Timing Delays During FPGA PlacementT. Martin, G. Grewal, S. Areibi. 124-127 [doi]
- Dovado: An Open-Source Design Space Exploration FrameworkDaniele Paletti, Davide Conficconi, Marco D. Santambrogio. 128-135 [doi]
- A Framework for the Automatic Generation of FPGA-based Near-Data Processing Accelerators in Smart Storage SystemsLukas Weber, Lukas Sommer, Leonardo Solis-Vasquez, Tobias Vinçon, Christian Knödler, Arthur Bernhardt, Ilia Petrov, Andreas Koch 0001. 136-143 [doi]
- On Data Parallelism Code Restructuring for HLS Targeting FPGAsRenato Campos, João M. P. Cardoso. 144-151 [doi]
- Fast HBM Access with FPGAs: Analysis, Architectures, and ApplicationsPhilipp Holzinger, Daniel Reiser, Tobias Hahn, Marc Reichenbach. 152-159 [doi]
- Graph Analytics on Hybrid System (GAHS) Case Study: PageRankMohamed W. Hassan, Peter M. Athanas. 160-167 [doi]
- Performance Study of Multi-tenant Cloud FPGAsJoel Mandebi Mbongue, Sujan Kumar Saha, Christophe Bobda. 168-171 [doi]
- RV-CAP: Enabling Dynamic Partial Reconfiguration for FPGA-Based RISC-V System-on-ChipNajdet Charaf, Ahmed Kamaleldin, Martin Thümmler, Diana Göhringer. 172-179 [doi]
- + Post-Quantum Signature CoprocessorQuentin Berthet, Andres Upegui, Laurent Gantel, Alexandre Duc, Giulia Traverso. 180-187 [doi]
- FPGA Acceleration of Zstd Compression AlgorithmJianyu Chen, Maurice Daverveldt, Zaid Al-Ars. 188-191 [doi]
- GYAN: Accelerating Bioinformatics Tools in Galaxy with GPU-Aware Computation MappingGulsum Gudukbay, Jashwant Raj Gunasekaran, Yilin Feng, Mahmut T. Kandemir, Anton Nekrutenko, Chita R. Das, Paul Medvedev, Björn A. Grüning, Nate Coraor, Nathan Roach, Enis Afgan. 194-203 [doi]
- Accelerating SARS-CoV-2 low frequency variant calling on ultra deep sequencing datasetsBryce Kille, Yunxi Liu, Nicolae Sapoval, Michael Nute, Lawrence Rauchwerger, Nancy M. Amato, Todd J. Treangen. 204-208 [doi]
- GateKeeper-GPU: Fast and Accurate Pre-Alignment Filtering in Short Read MappingZülal Bingöl, Mohammed Alser, Onur Mutlu, Ozcan Ozturk 0001, Can Alkan. 209 [doi]
- GPU Acceleration of 3D Agent-Based Biological SimulationsAhmad Hesam, Lukas Breitwieser, Fons Rademakers, Zaid Al-Ars. 210-217 [doi]
- Efficient Memory Management in Likelihood-based Phylogenetic PlacementPierre Barbera, Alexandros Stamatakis. 218-227 [doi]
- Accelerating the BPMax Algorithm for RNA-RNA InteractionChiranjeb Mondal, Sanjay Rajopadhye. 228-237 [doi]
- LAGraph: Linear Algebra, Network Analysis Libraries, and the Study of Graph AlgorithmsGábor Szárnyas, David A. Bader, Timothy A. Davis 0001, James Kitchen, Timothy G. Mattson, Scott McMillan, Erik Welch. 243-252 [doi]
- Introduction to GraphBLAS 2.0Benjamin Brock, Aydin Buluç, Timothy G. Mattson, Scott McMillan, José E. Moreira. 253-262 [doi]
- Mathematics of Digital HyperspaceJeremy Kepner, Timothy Davis, Vijay Gadepally, Hayden Jananthan, Lauren Milechin. 263-271 [doi]
- SPbLA: The Library of GPGPU-Powered Sparse Boolean Linear Algebra OperationsEgor Orachev, Maria Karpenko, Artem Khoroshev, Semyon V. Grigorev. 272-275 [doi]
- PIGO: A Parallel Graph Input/Output LibraryKasimir Gabert, Ümit V. Çatalyürek. 276-279 [doi]
- Hybrid Power-Law Models of Network TrafficPat Devlin, Jeremy Kepner, Ashley Luo, Erin Meger. 280-287 [doi]
- Characterizing Job-Task Dependency in Cloud Workloads Using Graph LearningZhaochen Gu, Sihai Tang, Beilei Jiang, Song Huang, Qiang Guan, Song Fu. 288-297 [doi]
- Co-design of Advanced Architectures for Graph Analytics using Machine LearningKuldeep Kurte, Neena Imam, Ramakrishnan Kannan, S. M. Shamimul Hasan, Srikanth Yoginath. 298-307 [doi]
- Sparse Binary Matrix-Vector Multiplication on Neuromorphic ComputersCatherine D. Schuman, Bill Kay, Prasanna Date, Ramakrishnan Kannan, Piyush Sao, Thomas E. Potok. 308-311 [doi]
- Let's Put the Memory Model Front and Center When Teaching Parallel Programming in C++Jirí Dokulil. 315-320 [doi]
- Teaching Complex Scheduling AlgorithmsSascha Hunold, Bartlomiej Przybylski. 321-327 [doi]
- ABET Accreditation: A Way Forward for PDC EducationSherif G. Aly 0001, Haidar Harmanani, Rajendra K. Raj, Sanaa Sharafeddine. 328-335 [doi]
- EduPar Virtual Poster SessionJesús Cámara, José-Carlos Cano, Javier Cuenca 0001, Toshiyuki Maeda, Mariano Saura-Sánchez, Lewis Tseng, Akiyoshi Wakatani, Martina Barnas. 336-341 [doi]
- Teaching PDC in the Time of COVID: Hands-on Materials for Remote LearningJoel C. Adams, Richard A. Brown, Suzanne J. Matthews, Elizabeth Shoop. 342-349 [doi]
- Data-Intensive Computing Modules for Teaching Parallel and Distributed ComputingMichael Gowanlock, Benoît Gallet. 350-357 [doi]
- Developing medical ultrasound beamforming application on GPU and FPGA using oneAPIYong Wang, Yongfa Zhou, Qi Scott Wang, Yang Wang, Qing Xu, Chen Wang, Bo Peng, Zhaojun Zhu, Katayama Takuya, Dylan Wang. 360-370 [doi]
- Evaluating CUDA Portability with HIPCL and DPCTZheming Jin, Jeffrey S. Vetter. 371-376 [doi]
- Beyond Fork-Join: Integration of Performance Portable Kokkos Kernels with HPXGregor Daiß, Mikael Simberg, Auriane Reverdell, John Biddiscombe, Theresa Pollinger, Hartmut Kaiser, Dirk Pflüger. 377-386 [doi]
- An Efficient Approach for Image Border Handling on GPUs via Iteration Space PartitioningBo Qiao, Jürgen Teich, Frank Hannig. 387-396 [doi]
- CUDAMicroBench: Microbenchmarks to Assist CUDA Performance ProgrammingXinyao Yi, David Stokes, Yonghong Yan 0001, Chunhua Liao. 397-406 [doi]
- Understanding Recursive Divide-and-Conquer Dynamic Programs in Fork-Join and Data-Flow Execution ModelsPoornima Nookala, Zafar Ahmad, Mohammad Mahdi Javanmard, Martin Kong, Rezaul Chowdhury, Robert J. Harrison. 407-416 [doi]
- Measuring Cache Complexity Using Data Movement Distance (DMD)Donovan Snyder, Chen Ding. 417-419 [doi]
- Combining Static and Dynamic Analysis to Query Characteristics of HPC ApplicationsAaron Welch, Oscar R. Hernandez, Barbara M. Chapman. 420-429 [doi]
- Time-Division Multiplexing for FPGA Considering CNN Model Switch TimeTetsuro Nakamura, Shogo Saito, Kei Fujimoto, Masashi Kaneko, Akinori Shiraga. 433-438 [doi]
- Design Space Exploration of Emerging Memory Technologies for Machine Learning ApplicationsS. M. Shamimul Hasan, Neena Imam, Ramakrishnan Kannan, Srikanth Yoginath, Kuldeep Kurte. 439-448 [doi]
- Accelerating Radiation Therapy Dose Calculation with Nvidia GPUsFelix Liu, Niclas Jansson, Artur Podobas, Albin Fredriksson, Stefano Markidis. 449-458 [doi]
- Improving Cryptanalytic Applications with Stochastic Runtimes on GPUsLena Oden, Jörg Keller 0001. 459-468 [doi]
- Experimental Evaluation of Multiprecision Strategies for GMRES on GPUsJennifer A. Loe, Christian A. Glusa, Ichitaro Yamazaki, Erik G. Boman, Sivasankaran Rajamanickam. 469-478 [doi]
- GPU-aware Communication with UCX in Parallel Programming Models: Charm++, MPI, and PythonJaemin Choi, Zane Fink, Sam White, Nitin Bhat, David F. Richards, Laxmikant V. Kalé. 479-488 [doi]
- CPRIC: Collaborative Parallelism for Randomized Incremental ConstructionsFlorian Fey, Sergei Gorlatch. 490-499 [doi]
- Characters Recognition based on CNN-RNN architecture and MetaheuristicF. Keddous, H. N. Nguyen, A. Nakib. 500-507 [doi]
- Linearizing Computing the Power Set with OpenMPRoger L. Goodwin. 508-519 [doi]
- TurboBFS: GPU Based Breadth-First Search (BFS) Algorithms in the Language of Linear AlgebraOswaldo Artiles, Fahad Saeed. 520-528 [doi]
- A Parallel Meta-Solver for the Multi-Objective Set Covering ProblemRyan J. Marshall, Lakmali Weerasena, Anthony Skjellum. 529-538 [doi]
- Leveraging High Dimensional Spatial Graph Embedding as a Heuristic for Graph AlgorithmsPeter Oostema, Franz Franchetti. 539-547 [doi]
- RRNS Base Extension Error-Correcting Code for Performance Optimization of Scalable Reliable Distributed Cloud Data StorageMikhail G. Babenko, Andrei Tchernykh, Luis Bernardo Pulido-Gaytan, Jorge M. Cortés-Mendoza, Egor Shiryaev, Elena Golimblevskaia, Arutyun Avetisyan, Sergio Nesmachnow. 548-553 [doi]
- Checkpointing vs. Supervision Resilience Approaches for Dynamic Independent TasksJonas Posner, Lukas Reitz, Claudia Fohry. 556-565 [doi]
- Gathering of seven autonomous mobile robots on triangular gridsMasahiro Shibata, Masaki Ohyabu, Yuichi Sudo, Junya Nakamura 0001, Yonghwan Kim 0001, Yoshiaki Katayama. 566-575 [doi]
- Autonomous Mobile Robots: Refining the Computational LandscapeKevin Buchin, Paola Flocchini, Irina Kostitsyna, Tom Peters, Nicola Santoro, Koichi Wada. 576-585 [doi]
- Terminating Grid Exploration with Myopic Luminous RobotsShota Nagahama, Fukuhito Ooshita, Michiko Inoue. 586-595 [doi]
- A self-stabilizing token circulation with graceful handover on bidirectional ring networksHirotsugu Kakugawa, Sayaka Kamei. 596-604 [doi]
- Scalable and Highly Available Multi-Objective Neural Architecture Search in Bare Metal Kubernetes ClusterAndreas Klos, Marius Rosenbaum, Wolfram Schiffmann. 605-610 [doi]
- Revisiting Credit Distribution Algorithms for Distributed Termination DetectionGeorge Bosilca, Aurélien Bouteiller, Thomas Hérault, Valentin Le Fèvre, Yves Robert, Jack J. Dongarra. 611-620 [doi]
- Efficient and Eventually Consistent Collective OperationsRoman Iakymchuk, Amândio Faustino, Andrew Emerson, João Barreto, Valeria Bartsch, Rodrigo Rodrigues, José C. Monteiro. 621-630 [doi]
- Autonomous Load Balancing in Distributed Hash Tables Using Churn and the Sybil AttackAndrew Rosen, Benjamin Levin, Anu G. Bourgeois. 631-640 [doi]
- Performance Models for Hybrid Programs Accelerated by GPUsAparna Sasidharan. 641-651 [doi]
- Evaluating the Performance of Integer Sum Reduction on an Intel GPUZheming Jin, Jeffrey S. Vetter. 652-655 [doi]
- On the Computational Power of Convolution Pooling: A Theoretical Approach for Deep LearningKoji Nakano, Shotaro Aoki, Yasuaki Ito, Akihiko Kasagi. 656-665 [doi]
- Load balancing for distributed nonlocal models within asynchronous many-task systemsPranav Gadikar, Patrick Diehl, Prashant K. Jha. 669-678 [doi]
- Scalable Hybrid Loop- and Task-Parallel Matrix Inversion for Multicore ProcessorsSandra Catalán, Francisco D. Igual, Rafael Rodríguez-Sánchez, Enrique S. Quintana-Ortí. 679-687 [doi]
- cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTsYu-Hsuan Shih, Garrett Wright, Joakim Andén, Johannes Blaschke, Alex H. Barnett. 688-697 [doi]
- Parallel Machine Learning of Partial Differential EquationsAmin Totounferoush, Neda Ebrahimi Pour, Sabine Roller, Miriam Mehl. 698-703 [doi]
- Improving Workload Balance of a Marine CSEM Inversion ApplicationJessica Imlau Dagostini, Henrique Corrêa Pereira da Silva, Vinícius Garcia Pinto, Roberto Machado Velho, Eduardo Simoes Lopes Gastal, Lucas Mello Schnorr. 704-713 [doi]
- Performance Modeling and Tuning for DFT Calculations on Heterogeneous ArchitecturesHadia Ahmed, David B. Williams-Young, Khaled Z. Ibrahim, Chao Yang 0001. 714-722 [doi]
- Parallelization of GKV benchmark using OpenACCMakoto Morishita, Satoshi Ohshima, Takahiro Katagiri, Toru Nagai. 723-729 [doi]
- A Flexible Research-Oriented Framework for Distributed Training of Deep Neural NetworksSergio Barrachina 0001, Adrián Castelló, Mar Catalán, Manuel F. Dolz, José I. Mestre. 730-739 [doi]
- Accelerated Polynomial Evaluation and Differentiation at Power Series in Multiple Double PrecisionJan Verschelde. 740-749 [doi]
- Evaluating I/O Acceleration Mechanisms of SX-Aurora TSUBASAYuta Sasaki, Ayumu Ishizuka, Mulya Agung, Hiroyuki Takizawa. 752-759 [doi]
- Efficient Parallel Multigrid Methods on Manycore Clusters with Double/Single Precision ComputingKengo Nakajima, Takseshi Ogita, Masatoshi Kawai. 760-769 [doi]
- Automatic Selection of Tensor Decomposition for Compressing Convolutional Neural Networks A Case Study on VGG-type NetworksChia-Chun Liang, Che-Rung Lee. 770-778 [doi]
- A Processor Selection Method based on Execution Time Estimation for Machine Learning ProgramsKou Murakami, Kazuhiko Komatsu, Masayuki Sato 0001, Hiroaki Kobayashi. 779-788 [doi]
- An Auto-tuning with Adaptation of A64 Scalable Vector Extension for SPIRALNaruya Kitai, Daisuke Takahashi, Franz Franchetti, Takahiro Katagiri, Satoshi Ohshima, Toru Nagai. 789-797 [doi]
- Improving the MPI-IO Performance of Applications with Genetic Algorithm based Auto-tuningAyse Bagbaba, Xuan Wang. 798-805 [doi]
- Autotuning Benchmarking Techniques: A Roofline Model Case StudyJacob O. Tørring, Jan Christian Meyer, Anne C. Elster. 806-815 [doi]
- Scalable Performance Prediction of Irregular Workloads in Multi-Phase Particle-in-Cell ApplicationsSai P. Chenna, Herman Lam, Greg Stitt, S. Balachandar. 816-825 [doi]
- User Allocation for Real-Time Applications with State Sharing in Fog Computing NetworksRyohei Sato, Hidetoshi Kawaguchi, Yuichi Nakatani. 828-831 [doi]
- Multi-Path Routing in the Jellyfish NetworkZaid Alzaid, Saptarshi Bhowmik, Xin Yuan 0001. 832-841 [doi]
- Addressing the Constraints of Active Learning on the EdgeEnrique Nueve, Sean Shahkarami, Seongha Park, Nicola Ferrier. 845-849 [doi]
- Informed Prefetching in I/O Bounded Distributed Deep LearningXiaojun Ruan, Haiquan Chen. 850-857 [doi]
- Performance Evaluation of Deep Learning Compilers for Edge InferenceGaurav Verma, Yashi Gupta, Abid M. Malik, Barbara M. Chapman. 858-865 [doi]
- DataVinci: Proactive Data Placement for Ad-Hoc ComputingMartin Breitbach, Janick Edinger, Dominik Schäfer, Christian Becker 0001. 866-873 [doi]
- Pilot-Edge: Distributed Resource Management Along the Edge-to-Cloud ContinuumAndré Luckow, Kartik Rattan, Shantenu Jha. 874-878 [doi]
- INT Based Network-Aware Task Scheduling for Edge ComputingBibek Shreshta, Richard Cziva, Engin Arslan. 879-886 [doi]
- Performance Comparison for Scientific Computations on the Edge via Relative PerformanceAravind Sankaran, Paolo Bientinesi. 887-895 [doi]
- Dynamic Computing Resources Allocation for Multiple Deep Learning TasksLiang Wei, Kazuyuki Shudo. 899-905 [doi]
- Scaling Single-Image Super-Resolution Training on Modern HPC Clusters: Early ExperiencesQuentin Anthony, Lang Xu, Hari Subramoni, Dhabaleswar K. D. K. Panda. 923-932 [doi]
- Distributed Deep Learning Using Volunteer Computing-Like ParadigmMedha Atre, Birendra Jha, Ashwini Rao. 933-942 [doi]
- Ex-NNQMD: Extreme-Scale Neural Network Quantum Molecular DynamicsPankaj Rajak, Anikeya Aditya, Shogo Fukushima, Rajiv K. Kalia, Thomas Linker, Kuang Liu, Ye Luo, Aiichiro Nakano, Ken-ichi Nomura, Kohei Shimamura, Fuyuki Shimojo, Priya Vashishta. 943-946 [doi]
- Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One HourArissa Wongpanich, Hieu Pham, James Demmel, Mingxing Tan, Quoc V. Le, Yang You, Sameer Kumar. 947-950 [doi]
- Performance Analysis of Deep Learning Workloads on a Composable SystemKaoutar El Maghraoui, Lorraine M. Herger, Chekuri Choudary, Kim Tran, Todd Deshane, David Hanson. 951-954 [doi]
- Facilitating Staging-based Unstructured Mesh Processing to Support Hybrid In-Situ WorkflowsZhe Wang, Pradeep Subedi, Matthieu Dorier, Philip E. Davis, Manish Parashar. 960-964 [doi]
- Exploring MPI Collective I/O and File-per-process I/O for Checkpointing a Logical Inference TaskKe-fan, Kristopher K. Micinski, Thomas Gilray, Sidharth Kumar. 965-972 [doi]
- Memory Efficient Edge Addition Designs for Large and Dynamic Social NetworksEunice E. Santos, Vairavan Murugappan, John Korah. 975-984 [doi]
- Load Balancing Schemes for Large Synthetic Population-Based Complex SimulatorsBogdan Mucenic, Chaitanya Kaligotla, Abby Stevens, Jonathan Ozik, Nicholson T. Collier, Charles M. Macal. 985-988 [doi]
- Application of Distributed Agent-based Modeling to Investigate Opioid Use Outcomes in Justice Involved PopulationsEric Tatara, John Schneider, Madeline Quasebarth, Nicholson T. Collier, Harold Pollack, Basmattee Boodram, Samuel H. Friedman, Elizabeth Salisbury-Afshar, Mary Ellen Mackesy-Amiti, Jonathan Ozik. 989-997 [doi]
- Shared-Memory Scalable k-Core Maintenance on Dynamic Graphs and HypergraphsKasimir Gabert, Ali Pinar, Ümit V. Çatalyürek. 998-1007 [doi]
- P-Flee: An Efficient Parallel Algorithm for Simulating Human MigrationPetros Anastasiadis, Sergiy Gogolenko, Nikela Papadopoulou, Marcin Lawenda, Hamid Arabnejad, Alireza Jahani, Imran Mahmood, Derek Groen. 1008-1011 [doi]