Abstract is missing.
- A Scalable Inference Pipeline for 3D Axon Tracing AlgorithmsBenjamin Fenelon, Lars A. Gjesteby, Webster Guan, Juhyuk Park, Kwanghun Chung, Laura J. Brattain. 1-6 [doi]
- Predicting Ankle Moment Trajectory with Adaptive Weighted Ensemble of LSTM NetworksEmilia Grzesiak, Jennifer Sloboda, Ho Chit Siu. 1-7 [doi]
- Kalman Filter Driven Estimation of Community Structure in Time Varying GraphsLisa J. K. Durbeck, Peter Athanas. 1-7 [doi]
- Optimal GPU Frequency Selection using Multi-Objective Approaches for HPC SystemsGhazanfar Ali, Sridutt Bhalachandra, Nicholas J. Wright, Mert Side, Yong Chen. 1-7 [doi]
- Optimizing Designs Using Several Types of Memories on Modern FPGAsMehmet Güngör, Kai Huang, Stratis Ioannidis, Miriam Leeser. 1-7 [doi]
- Evaluation of a Novel Scratchpad Memory through Compiler Supported SimulationEssa Imhmed, Jonathan E. Cook, Abdel-Hameed A. Badawy. 1-7 [doi]
- Surrogate ML/AI Model Benchmarking for FAIR Principles' ConformancePiotr Luszczek, Cade Brown. 1-5 [doi]
- Systolic Array based FPGA accelerator for Yolov3-tinyPrithvi Velicheti, Sivani Pentapati, Suresh Purini. 1-2 [doi]
- The Viability of Using Online Prediction to Perform Extra Work while Executing BSP ApplicationsPo-Hao Chen, Pouya Haghi, Jae Yoon Chung, Tong Geng, Richard West, Anthony Skjellum, Martin C. Herbordt. 1-7 [doi]
- Towards Fast GPU-based Sparse DNN Inference: A Hybrid Compute ModelShaoxian Xu, Minkang Wu, Long Zheng 0003, Zhiyuan Shao, Xiangyu Ye, Xiaofei Liao, Hai Jin 0001. 1-7 [doi]
- FAST: A Scalable Subgraph Matching Framework over Large GraphsJiezhong He, Zhouyang Liu, Yixin Chen, Hengyue Pan, Zhen Huang 0006, Dongsheng Li. 1-7 [doi]
- Analyzing Multi-trillion Edge Graphs on Large GPU Clusters: A Case Study with PageRankSeunghwa Kang, Joseph Nke, Brad Rees. 1-7 [doi]
- HashTag: Fast Lookup in a Persistent Memory File SystemMatthew Curtis-Maury, Yash Trivedi. 1-7 [doi]
- An Evaluation of Low Overhead Time Series Preprocessing Techniques for Downstream Machine LearningMatthew L. Weiss, Joseph McDonald, David Bestor, Charles Yee, Daniel Edelman, Michael Jones 0001, Andrew Prout, Andrew Bowne, Lindsey McEvoy, Vijay Gadepally, Siddharth Samsi. 1-6 [doi]
- HTC: Hybrid vertex-parallel and edge-parallel Triangle CountingLi Zeng, Kang Yang, Haoran Cai, Jinhua Zhou, Rongqian Zhao, Xin Chen. 1-7 [doi]
- HuGraph: Acceleration of GCN Training on Heterogeneous FPGA Clusters with QuantizationLetian Zhao, Qizhe Wu, Xiaotian Wang, Teng Tian, Wei Wu, Xi Jin 0002. 1-7 [doi]
- AutoPager: Auto-tuning Memory-Mapped I/O Parameters in UserspaceKarim Youssef, Niteya Shah, Maya B. Gokhale, Roger Pearce, Wu-chun Feng. 1-7 [doi]
- Resource-Constrained Optimizations For Synthetic Aperture Radar On-Board Image ProcessingMaron Schlemon, Martin Schulz 0001, Rolf Scheiber. 1-8 [doi]
- Efficient Calculation of Triangle Centrality in Big Data NetworksWali Mohammad Abdullah, David Awosoga, Shahadat Hossain. 1-7 [doi]
- Unsupervised Adaptation of Spiking Networks in a Gradual Changing EnvironmentZaidao Mei, Mark Barnell, Qinru Qiu. 1-7 [doi]
- AI and ML Accelerator Survey and TrendsAlbert Reuther, Peter Michaleas, Michael Jones 0001, Vijay Gadepally, Siddharth Samsi, Jeremy Kepner. 1-10 [doi]
- Enhancing the Performance Portability of Heterogeneous Circuit Analysis ProgramsTsung-Wei Huang. 1-2 [doi]
- Flexible Hardware Accelerator Design Generation with SpiralGuanglin Xu, James C. Hoe, Franz Franchetti. 1-7 [doi]
- SHARP: Software Hint-Assisted Memory Access Prediction for Graph AnalyticsPengmiao Zhang, Rajgopal Kannan, Xiangzhi Tong, Anant V. Nori, Viktor K. Prasanna. 1-8 [doi]
- Site-Wide HPC Data Center Demand ResponseDaniel Curtis Wilson, Ioannis Ch. Paschalidis, Ayse K. Coskun. 1-7 [doi]
- Enabling Transformers to Understand Low-Level ProgramsZifan Carl Guo, William S. Moses. 1-9 [doi]
- Scalable Interactive Autonomous Navigation Simulations on HPCWesley Brewer, Joel Bretheim, John Kaniarz, Peilin Song, Burhman Q. Gates. 1-7 [doi]
- Task-Parallel Programming with Constrained ParallelismTsung-Wei Huang, Leslie Hwang. 1-7 [doi]
- Exploring the Impacts of Software Cache Configuration for In-line Compressed ArraysSansriti Ranjan, Dakota Fulp, Jon C. Calhoun. 1-7 [doi]
- Processing Particle Data Flows with SmartNICsJianshen Liu, Carlos Maltzahn, Matthew L. Curry, Craig D. Ulmer. 1-8 [doi]
- Optimizing Performance and Storage of Memory-Mapped Persistent Data StructuresKarim Youssef, Abdullah Al Raqibul Islam, Keita Iwabuchi, Wu-chun Feng, Roger Pearce. 1-7 [doi]
- Improved Distributed-memory Triangle Counting by Exploiting the Graph StructureSayan Ghosh. 1-6 [doi]
- C2QA - Bosonic QiskitTimothy J. Stavenger, Eleanor Crane, Kevin Smith, Christopher T. Kang, Steven M. Girvin, Nathan Wiebe. 1-8 [doi]
- Parallelizing Explicit and Implicit Extrapolation Methods for Ordinary Differential EquationsUtkarsh Rajput, Chris Elrod, Yingbo Ma, Konstantin Althaus, Christopher Rackauckas. 1-9 [doi]
- Enabling Novel In-Memory Computation Algorithms to Address Next-Generation Throughput Constraints on SWaP- Limited PlatformsJessica Ray, Chad R. Meiners. 1-7 [doi]
- Optimizing open-source FPGA CAD toolsShachi Khadilkar, Martin Margala. 1-4 [doi]
- RaiderSTREAM: Adapting the STREAM Benchmark to Modern HPC SystemsMichael Beebe, Brody Williams, Stephen Devaney, John D. Leidel, Yong Chen, Steve Poole. 1-7 [doi]
- SuperCloud Lite in the Cloud - lightweight, secure, self-service, on-demand mechanisms for creating customizable research computing environmentsKelsie Edie, Kurt Keville, Lauren Milechin, Chris Hill. 1-8 [doi]
- Challenges Designing for FPGAs Using High-Level SynthesisClayton J. Faber, Steven D. Harris, Zhili Xiac, Roger D. Chamberlain, Anthony M. Cabrera. 1-7 [doi]
- Hardware Software Codesign of Applications on the Edge: Accelerating Digital PreDistortion for Wireless CommunicationsZhaoyang Han, Yiyue Jiang, Rahul Mushini, John Dooley, Miriam Leeser. 1-6 [doi]
- Fast Graph Algorithms for Superpixel SegmentationDimitris Floros, Tiancheng Liu, Nikos Pitsianis, Xiaobai Sun. 1-8 [doi]
- Towards Fast Crash-Consistent Cluster CheckpointingAndrew Wood, Moshik Hershcovitch, Ilias Ennmouri, Weiyu Zong, Saurav Chennuri, Sarel Cohen, Swaminathan Sundararaman, Daniel G. Waddington, Sang (Peter) Chin. 1-8 [doi]
- GraphBLAS on the Edge: Anonymized High Performance Streaming of Network TrafficMichael Jones 0001, Jeremy Kepner, Daniel Andersen, Aydin Buluç, Chansup Byun, kc claffy, Timothy Davis, William Arcand, Jonathan Bernays, David Bestor, William Bergeron, Vijay Gadepally, Micheal Houle, Matthew Hubbell, Hayden Jananthan, Anna Klein, Chad R. Meiners, Lauren Milechin, Julie Mullen, Sandeep Pisharody, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Jon Sreekanth, Doug Stetson, Charles Yee, Peter Michaleas. 1-8 [doi]
- Parallel Computing with DNA Forensics DataAdam Michaleas, Philip Fremont-Smith, Chelsea Lennartz, Darrell O. Ricke. 1-7 [doi]
- Large Scale Enrichment and Statistical Cyber Characterization of Network TrafficIvan Kawaminami, Arminda Estrada, Youssef Elsakkary, Hayden Jananthan, Aydin Buluç, Tim Davis 0001, Daniel Grant, Michael Jones 0001, Chad R. Meiners, Andrew Morris, Sandeep Pisharody, Jeremy Kepner. 1-7 [doi]
- Python Implementation of the Dynamic Distributed Dimensional Data ModelHayden Jananthan, Lauren Milechin, Michael Jones 0001, William Arcand, William Bergeron, David Bestor, Chansup Byun, Michael Houle 0001, Matthew Hubbell, Vijay Gadepally, Anna Klein, Peter Michnlons, Guillermo Morales, Julie Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Charles Yee, Jeremy Kepner. 1-8 [doi]
- Real-Time Software Architecture for EM-Based Radar Signal Processing and TrackingAlan Nussbaum, Byron Keel, William Dale Blair, Umakishore Ramachandran. 1-7 [doi]
- Performance Modeling Sparse MTTKRP Using Optical Static Random Access Memory on FPGASasindu Wijeratne, Akhilesh R. Jaiswal, Ajey P. Jacob, Bingyi Zhang, Viktor K. Prasanna. 1-7 [doi]
- A Hierarchical Jacobi Iteration for Structured Matrices on GPUs using Shared MemoryMohammad Shafaet Islam, Qiqi Wang. 1-7 [doi]
- Apple Silicon Performance in Scientific ComputingConnor Kenyon, Collin Capano. 1-10 [doi]
- Modeling the Energy Efficiency of GEMM using Optical Random Access MemoryBingyi Zhang, Akhilesh R. Jaiswal, Clynn Mathew, Ravi Teja Lakkireddy, Ajey P. Jacob, Sasindu Wijeratne, Viktor K. Prasanna. 1-7 [doi]
- A Multi-GPU Parallel Genetic Algorithm For Large-Scale Vehicle Routing ProblemsMarwan F. Abdelatti, Manbir Sodhi, Resit Sendag. 1-8 [doi]
- Achieving Speedups for Distributed Graph BiconnectivityIan Bogle, George M. Slota. 1-7 [doi]
- Hardware Design and Implementation of Post-Quantum Cryptography KyberQingru Zeng, Quanxin Li, Baoze Zhao, Han Jiao, Yihua Huang. 1-6 [doi]
- A High Throughput Hardware Accelerator for FFTW Codelets: A First LookLarry T. Pileggi, Siyuan Chen, Keshav Harisrikanth, Guanglin Xu, Ken Mai, Franz Franchetti. 1-7 [doi]
- Sparse Deep Neural Network Inference Using Different Programming ModelsHyungro Lee, Milan Jain, Sayan Ghosh. 1-6 [doi]
- Hardware Design and Implementation of Classic McEliece Post-Quantum Cryptosystem Based on FPGAShaofen Chen, Haiyan Lin, Wenjin Huang, Yihua Huang. 1-6 [doi]
- Distributed Hardware Accelerated Secure Joint Computation on the COPA FrameworkRushi Patel, Pouya Haghi, Shweta Jain 0005, Andriy Kot, Venkata Krishnan, Mayank Varia, Martin C. Herbordt. 1-7 [doi]
- On the Characterization of the Performance-Productivity Gap for FPGAAtharva Gondhalekar, Thomas Twomey, Wu-chun Feng. 1-8 [doi]
- Explicit Ordering Refinement for Accelerating Irregular Graph AnalysisMichael Mandulak, Ruochen Hu, George Slota. 1-8 [doi]
- Benchmarking Resource Usage for Efficient Distributed Deep LearningNathan C. Frey, Baolin Li, Joseph McDonald, Dan Zhao, Michael Jones 0001, David Bestor, Devesh Tiwari, Vijay Gadepally, Siddharth Samsi. 1-8 [doi]
- Performance Estimation for Efficient Image Segmentation Training of Weather Radar AlgorithmsJoseph McDonald, James M. Kurdzo, Phillip M. Stepanian, Mark Veillette, David Bestor, Michael Jones 0001, Vijay Gadepally, Siddharth Samsi. 1-7 [doi]
- Trends in Energy Estimates for Computing in AI/Machine Learning Accelerators, Supercomputers, and Compute-Intensive ApplicationsSadasivan Shankar, Albert Reuther. 1-8 [doi]
- Constructing Optimal Contraction Trees for Tensor Network Quantum Circuit SimulationCameron Ibrahim, Danylo Lykov, Zichang He, Yuri Alexeev, Ilya Safro. 1-8 [doi]
- Hypersparse Network Flow Analysis of Packets with GraphBLASTyler Trigg, Chad R. Meiners, Sandeep Pisharody, Hayden Jananthan, Michael Jones 0001, Adam Michaleas, Timothy Davis, Erik Welch, William Arcand, David Bestor, William Bergeron, Chansup Byun, Vijay Gadepally, Micheal Houle, Matthew Hubbell, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Doug Stetson, Charles Yee, Jeremy Kepner. 1-7 [doi]
- Distributed Out-of-Memory SVD on CPU/GPU ArchitecturesIsmael Boureima, Manish Bhattarai, Maksim Ekin Eren, Nick Solovyev, Hristo N. Djidjev, Boian S. Alexandrov. 1-8 [doi]
- Online Detection and Classification of State Transitions of Multivariate Shock and Vibration DataNicklaus Przybylski, William M. Jones, Nathan DeBardeleben. 1-7 [doi]
- pPython for Parallel Python ProgrammingChansup Byun, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle 0001, Matthew Hubbell, Hayden Jananthan, Michael Jones 0001, Kurt Keville, Anna Klein, Peter Michaleas, Lauren Milechin, Guillermo Morales, Julie Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Charles Yee, Jeremy Kepner. 1-6 [doi]
- Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-level AnalysisHamdy Abdelkhalik, Yehia Arafa, Nandakishore Santhi, Abdel-Hameed A. Badawy. 1-8 [doi]
- An SSD-Based Accelerator for Singular Value Decomposition Recommendation Algorithm on EdgeWei Wu, Letian Zhao, Qizhe Wu, Xiaotian Wang, Teng Tian, Xi Jin 0002. 1-5 [doi]
- Deep Gaussian process with multitask and transfer learning for performance optimizationWissam M. Sid-Lakhdar, Mohsen Aznaveh, Piotr Luszczek, Jack J. Dongarra. 1-7 [doi]
- Generating Permutations Using Hash TablesOded Green, Corey Nolet, Joe Eaton. 1-7 [doi]
- Im2win: Memory Efficient Convolution On SIMD ArchitecturesShuai Lu, Jun Chu, Xu T. Liu. 1-7 [doi]
- Design and Implementation of a Real-time Parallel FFT for a Direction-Finding System on an FPGABheema Lakshmi Pradeep, Rishu Anand, Pavan Vadakattu, Syed Azemuddin, Aquibuddin Ahmed. 1-7 [doi]
- Kv2vec: A Distributed Representation Method for Key-value Pairs from Metadata AttributesChenxu Niu, Wei Zhang, Suren Byna, Yong Chen. 1-7 [doi]
- Ultra Low-Power Deep Learning Applications at the Edge with Jetson Orin AGX HardwareMark Barnell, Courtney Raymond, Steven Smiley, Darrek Isereau, Daniel Brown. 1-4 [doi]
- Machine learning for accurate and fast bandgap prediction of solid-state materialsShomik Verma, Shivam Kajale, Rafael Gómez-Bombarelli. 1-2 [doi]
- Computing In-Place FFTs with SIMD Lane SlicingBenoît Dupont de Dinechin. 1-7 [doi]
- Towards Hardware Accelerated Garbage Collection with Near-Memory ProcessingSamuel Thomas, Jiwon Choe, Ofir Gordon, Erez Petrank, Tali Moreshet, Maurice Herlihy, R. Iris Bahar. 1-6 [doi]
- GPU-Accelerated High-Bandwidth Radar CentroidingDavid Brigada, Maximilian Merfeld, Kara Warner. 1-6 [doi]
- Quantum Netlist Compiler (QNC)Shamminuj Aktar, Abdel-Hameed A. Badawy, Nandakishore Santhi. 1-7 [doi]
- Powering Practical Performance: Accelerated Numerical Computing in Pure PythonMatthew Penn, Chris Milroy. 1-5 [doi]
- Edge-Connected Jaccard Similarity for Graph Link Prediction on FPGAPaul Sathre, Atharva Gondhalekar, Wu-chun Feng. 1-10 [doi]
- Performance speedup of Quantum Espresso using optimized AOCL-FFTWS. Biplab Raut. 1-4 [doi]
- DASH: Scheduling Deep Learning Workloads on Multi-Generational GPU-Accelerated ClustersBaolin Li, Tirthak Patel, Vijay Gadepally, Karen Gettings, Siddharth Samsi, Devesh Tiwari. 1-7 [doi]
- How to Prevent a Sick ASICWilliam F. Ellersick. 1-6 [doi]
- A High-performance Deployment Framework for Pipelined CNN Accelerators with Flexible DSE StrategyConghui Luo, Wenjin Huang, Dehao Xiang, Yihua Huang. 1-8 [doi]
- Towards a Generic UVMKholoud Mahmoud, Randa Ahmed, Karim Ayman, Mostafa Aymau, Waleed Taie, Yasser Ibrahim, Hassan Mostafa, Khaled Salah 0001. 1-6 [doi]
- FPGA Acceleration of Fully Homomorphic Encryption over the TorusTian Ye, Rajgopal Kannan, Viktor K. Prasanna. 1-7 [doi]
- Proposed Empirical Assessment of Remote Workers' Cyberslacking and Computer Security Posture to Assess Organizational Cybersecurity RisksAriel Luna, Yair Levy, Gregory Simco, Wei Li. 1-2 [doi]
- Accelerating Sparse Deep Neural Network Inference Using GPU Tensor CoresYufei Sun, Long Zheng 0003, Qinggang Wang, Xiangyu Ye, Yu Huang 0013, Pengcheng Yao, Xiaofei Liao, Hai Jin 0001. 1-7 [doi]