Abstract is missing.
- Inter-loop optimization in RAJA using loop chainsBrandon Neth, Thomas R. W. Scogland, Bronis R. de Supinski, Michelle Mills Strout. 1-12 [doi]
- Tile size selection of affine programs for GPGPUs using polyhedral cross-compilationKhaled Abdelaal, Martin Kong. 13-26 [doi]
- A practical tile size selection model for affine loop nestsKumudha Narasimhan, Aravind Acharya, Abhinav Baid, Uday Bondhugula. 27-39 [doi]
- Does it matter?: OMPSanitizer: an impact analyzer of reported data races in OpenMP programsWenwen Wang, Pei-Hung Lin. 40-51 [doi]
- NumaPerf: predictive NUMA profilingXin Zhao, Jin Zhou, Hui Guan, Wei Wang, Xu Liu, Tongping Liu. 52-62 [doi]
- NPBench: a benchmarking suite for high-performance NumPyAlexandros Nikolaos Ziogas, Tal Ben-Nun, Timo Schneider, Torsten Hoefler. 63-74 [doi]
- DSGEN: concolic testing GPU implementations of concurrent dynamic data structuresXiaofan Sun, Rajiv Gupta 0001. 75-87 [doi]
- Task-graph scheduling extensions for efficient synchronization and communicationSeonmyeong Bak, Oscar R. Hernandez, Mark Gates, Piotr Luszczek, Vivek Sarkar. 88-101 [doi]
- μSteal: a theory-backed framework for preemptive work and resource stealing in mixed-criticality microservicesAmirhossein Mirhosseini, Thomas F. Wenisch. 102-114 [doi]
- ThundeRiNG: generating multiple independent random number sequences on FPGAsHongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, Weng-Fai Wong. 115-126 [doi]
- FT-BLAS: a high performance BLAS implementation with online fault toleranceYujia Zhai, Elisabeth Giem, Quan Fan, Kai Zhao, Jinyang Liu, Zizhong Chen. 127-138 [doi]
- PSSM: achieving secure memory for GPUs with partitioned and sectored security metadataShougang Yuan, Yan Solihin, Huiyang Zhou. 139-151 [doi]
- Omegaflow: a high-performance dependency-based architectureYaoyang Zhou, Zihao Yu, Chuanqi Zhang, Yinan Xu, Huizhe Wang, Sa Wang, Ninghui Sun, Yungang Bao. 152-163 [doi]
- PLANAR: a programmable accelerator for near-memory data rearrangementAdrián Barredo, Adrià Armejach, Jonathan C. Beard, Miquel Moretó. 164-176 [doi]
- Power and energy efficient routing for Mach-Zehnder interferometer based photonic switchesMarkos Kynigos, Jose Antonio Pascual, Javier Navaridas, John Goodacre, Mikel Luján. 177-189 [doi]
- Athena: high-performance sparse tensor contraction sequence on heterogeneous memoryJiawen Liu, Dong Li, Roberto Gioiosa, Jiajia Li. 190-202 [doi]
- Optimizing large-scale plasma simulations on persistent memory-based heterogeneous memory with effective data placement across memory hierarchyJie Ren 0015, Jiaolin Luo, Ivy Bo Peng, Kai Wu 0006, Dong Li 0001. 203-214 [doi]
- MD-HM: memoization-based molecular dynamics simulations on big memory systemZhen Xie, Wenqian Dong, Jie Liu, Ivy Bo Peng, Yanbao Ma, Dong Li 0001. 215-226 [doi]
- Enabling energy-efficient DNN training on hybrid GPU-FPGA acceleratorsXin He, Jiawen Liu, Zhen Xie, Hao Chen, Guoyang Chen, Weifeng Zhang, Dong Li. 227-241 [doi]
- Proxima: accelerating the integration of machine learning in atomistic simulationsYuliana Zamora, Logan T. Ward, Ganesh Sivaraman, Ian T. Foster, Henry Hoffmann. 242-253 [doi]
- Partitioning sparse deep neural networks for scalable training and inferenceGunduz Vehbi Demirci, Hakan Ferhatosmanoglu. 254-265 [doi]
- ClickTrain: efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruningChengming Zhang, Geng Yuan, Wei Niu, Jiannan Tian, Sian Jin, Donglin Zhuang, Zhe Jiang 0001, Yanzhi Wang, Bin Ren, Shuaiwen Leon Song, Dingwen Tao. 266-278 [doi]
- SumMerge: an efficient algorithm and implementation for weight repetition-aware DNN inferenceRohan Baskar Prabhakar, Sachit Kuhar, Rohit Agrawal 0001, Christopher J. Hughes, Christopher W. Fletcher. 279-290 [doi]
- Accelerating DNNs inference with predictive layer fusionMohammadHossein Olyaiy, Christopher Ng, Mieszko Lis. 291-303 [doi]
- AUTO-PRUNE: automated DNN pruning and mapping for ReRAM-based acceleratorSiling Yang, Weijian Chen, Xuechen Zhang, Shuibing He, Yanlong Yin, Xian-He Sun. 304-315 [doi]
- Performance portable back-projection algorithms on CPUs: agnostic data locality and vectorization optimizationsPeng Chen, Mohamed Wahib, Xiao Wang, Shin'ichiro Takizawa, Takahiro Hirofuchi, Hirotaka Ogawa, Satoshi Matsuoka. 316-328 [doi]
- A systematic approach to improving data locality across Fourier transforms and linear algebra operationsDoru-Thom Popovici, Andrew Canning, Zhengji Zhao, Lin-Wang Wang, John Shalf. 329-341 [doi]
- Delay sensitivity-driven congestion mitigation for HPC systemsArchit Patke, Saurabh Jha, Haoran Qiu, Jim M. Brandt, Ann C. Gentile, Joe Greenseid, Zbigniew Kalbarczyk, Ravishankar K. Iyer. 342-353 [doi]
- Topology-aware optimizations for multi-GPU ptychographic image reconstructionXiaodong Yu, Tekin Bicer, Rajkumar Kettimuthu, Ian T. Foster. 354-366 [doi]
- Distributed merge forest: a new fast and scalable approach for topological analysis at scaleXuan Huang, Pavol Klacansky, Steve Petruzza, Attila Gyulassy, Peer-Timo Bremer, Valerio Pascucci. 367-377 [doi]
- Sandslash: a two-level framework for efficient graph pattern miningXuhao Chen 0001, Roshan Dathathri, Gurbinder Gill, Loc Hoang, Keshav Pingali. 378-391 [doi]
- On the automatic parallelization of subscripted subscript patterns using array property analysisAkshay Bhosale, Rudolf Eigenmann. 392-403 [doi]
- ALTO: adaptive linearized storage of sparse tensorsAhmed E. Helal, Jan Laukemann, Fabio Checconi, Jesmin Jahan Tithi, Teresa M. Ranadive, Fabrizio Petrini, JeeWhan Choi. 404-416 [doi]
- An optimized tensor completion library for multiple GPUsMing Dun, Yunchun Li, Hailong Yang, Qingxiao Sun, Zhongzhi Luan, Depei Qian. 417-430 [doi]
- Distributed-memory parallel algorithms for sparse times tall-skinny-dense matrix multiplicationOguz Selvitopi, Benjamin Brock, Israt Nisa, Alok Tripathy, Katherine A. Yelick, Aydin Buluç. 431-442 [doi]
- HyQuas: hybrid partitioner based quantum circuit simulation system on GPUChen Zhang, Zeyu Song, Haojie Wang, Kaiyuan Rong, Jidong Zhai. 443-454 [doi]
- FULL-W2V: fully exploiting data reuse for W2V on GPU-accelerated systemsThomas Randall, Tyler Allen, Rong Ge 0002. 455-466 [doi]
- A performance portability framework for PythonNader Al Awar, Steven Zhu, George Biros, Milos Gligoric. 467-478 [doi]
- ProMT: optimizing integrity tree updates for write-intensive pages in secure NVMsMazen Al-Wadi, Aziz Mohaisen, Amro Awad. 479-490 [doi]