Abstract is missing.
- Overview of HPC and AI Computing for COVID-19 in the USRick Stevens. 1 [doi]
- cuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific DataJiannan Tian, Sheng Di, Kai Zhao, Cody Rivera, Megan Hickman Fulp, Robert Underwood, Sian Jin, Xin Liang, Jon Calhoun, Dingwen Tao, Franck Cappello. 3-15 [doi]
- TAFE: Thread Address Footprint Estimation for Capturing Data/Thread Locality in GPU SystemsKishore Punniyamurthy, Andreas Gerstlauer. 17-29 [doi]
- SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning InferenceZiheng Wang. 31-42 [doi]
- GOPipe: A Granularity-Oblivious Programming Framework for Pipelined Stencil Executions on GPUChanyoung Oh, Zhen Zheng, Xipeng Shen, Jidong Zhai, Youngmin Yi. 43-54 [doi]
- Exploring the Design Space of Static and Incremental Graph Connectivity Algorithms on GPUsChangwan Hong, Laxman Dhulipala, Julian Shun. 55-69 [doi]
- Fireiron: A Data-Movement-Aware Scheduling Language for GPUsBastian Hagedorn, Archibald Samuel Elliott, Henrik Barthels, Rastislav Bodík, Vinod Grover. 71-82 [doi]
- Automatic Generation of Multi-Objective Polyhedral Compiler TransformationsLorenzo Chelini, Tobias Gysi, Tobias Grosser, Martin Kong, Henk Corporaal. 83-96 [doi]
- Bandwidth-Aware Loop Tiling for DMA-Supported Scratchpad MemoryMingchuan Wu, Ying Liu, Huimin Cui, Qingfu Wei, Quanfeng Li, Limin Li, Fang Lv, Jingling Xue, Xiaobing Feng 0002. 97-109 [doi]
- Deep Program Structure Modeling Through Multi-Relational Graph-based LearningGuixin Ye, Zhanyong Tang, Huanting Wang, Dingyi Fang, Jianbin Fang, Songfang Huang, Zheng Wang. 111-123 [doi]
- AutoHOOT: Automatic High-Order Optimization for TensorsLinjian Ma, Jiayu Ye, Edgar Solomonik. 125-137 [doi]
- Intelligent Data Placement on Discrete GPU Nodes with Unified MemoryTanzima Sultana, Blake Allen, Apan Qasem. 139-151 [doi]
- Deep Learning Assisted Resource Partitioning for Improving Performance on Commodity ServersRuobing Chen 0002, Jinping Wu, Haosen Shi, Yusen Li, Haiyan Yin, Shanjiang Tang, Xiaoguang Liu, Gang Wang. 153-154 [doi]
- Decoupled Address Translation for Heterogeneous Memory SystemsBokyeong Kim, Soojin Hwang, Sanghoon Cha, Chang Hyun Park 0001, Jongse Park, Jaehyuk Huh. 155-156 [doi]
- Bandwidth Bottleneck in Network-on-Chip for High-Throughput ProcessorsJiho Kim, Sanghun Cho, Minsoo Rhu, Ali Bakhoda, Tor M. Aamodt, John Kim. 157-158 [doi]
- Scalable Specialization: Architectures, Interfaces, & ApplicationsSarita V. Adve. 159 [doi]
- Analyzing and Leveraging Shared L1 Caches in GPUsMohamed Assem Ibrahim, Onur Kayiran, Yasuko Eckert, Gabriel H. Loh, Adwait Jog. 161-173 [doi]
- Transmuter: Bridging the Efficiency Gap using Memory and Dataflow ReconfigurationSubhankar Pal, Siying Feng, Dong-Hyeon Park, Sung Kim, Aporva Amarnath, Chi-Sheng Yang, Xin He, Jonathan Beaumont, Kyle May, Yan Xiong, Kuba Kaszyk, John Magnus Morton, Jiawen Sun, Michael F. P. O'Boyle, Murray Cole, Chaitali Chakrabarti, David T. Blaauw, Hun-Seok Kim, Trevor N. Mudge, Ronald G. Dreslinski. 175-190 [doi]
- Enhancing Address Translations in Throughput Processors via CompressionXulong Tang, Ziyu Zhang, Weizheng Xu, Mahmut Taylan Kandemir, Rami G. Melhem, Jun Yang. 191-204 [doi]
- Regional Out-of-Order Writes in Total Store OrderSawan Singh, Alexandra Jimborean, Alberto Ros. 205-216 [doi]
- Parallel and Scalable Precise ClusteringStuart Byma, Akash Dhasade, Adrian M. Altenhoff, Christophe Dessimoz, James R. Larus. 217-228 [doi]
- SecSched: Flexible Scheduling in Secure ProcessorsOmais Shafi, Janibul Bashir. 229-240 [doi]
- Clearing the Shadows: Recovering Lost Performance for Invisible Speculative Execution through HW/SW Co-DesignKim-Anh Tran, Christos Sakalis, Magnus Själander, Alberto Ros, Stefanos Kaxiras, Alexandra Jimborean. 241-254 [doi]
- Fast Convolutional Neural Networks with Fine-Grained FFTsYulin Zhang, Xiaoming Li. 255-265 [doi]
- Accelerating Sparse CNN Inference on GPUs with Performance-Aware Weight PruningMasuma Akter Rumi, Xiaolong Ma, Yanzhi Wang, Peng Jiang. 267-278 [doi]
- SparseTrain: Leveraging Dynamic Sparsity in Software for Training DNNs on General-Purpose SIMD ProcessorsZhangxiaowen Gong, Houxiang Ji, Christopher W. Fletcher, Christopher J. Hughes, Josep Torrellas. 279-292 [doi]
- Helix: Algorithm/Architecture Co-design for Accelerating Nanopore Genome Base-callingQian Lou, Sarath Chandra Janga, Lei Jiang 0001. 293-304 [doi]
- Opportunistic Early Pipeline Re-steering for Data-dependent BranchesSaurabh Gupta, Niranjan Soundararajan, Ragavendra Natarajan, Sreenivas Subramoney. 305-316 [doi]
- Model-Based Warp Overlapped Tiling for Image Processing Programs on GPUsAbhinav Jangda, Arjun Guha. 317-328 [doi]
- Low-Latency Proactive Continuous VisionYiming Gan, Yuxian Qiu, Lele Chen, Jingwen Leng, Yuhao Zhu 0001. 329-342 [doi]
- Collective Affinity Aware Computation MappingMahmut T. Kandemir, Jihyun Ryoo, Hui Zhao 0013, Myoungsoo Jung, Mustafa Karaköy. 343-344 [doi]
- VTensor: Using Virtual Tensors to Build a Layout-oblivious AI Programming FrameworkFeng Yu, Jiacheng Zhao, Huimin Cui, Xiaobing Feng 0002, Jingling Xue. 345-346 [doi]
- Parallelizing Parallel Programs: A Dynamic Pattern Analysis for Modernization of Legacy Parallel CodeRoberto Castañeda Lozano, Murray Cole, Björn Franke. 347-348 [doi]
- A New Qubits Mapping Mechanism for Multi-programming Quantum ComputingXinglei Dou, Lei Liu. 349-350 [doi]
- Exploiting Locality in Scalable Ordered MapsMatthew Rodriguez, Ahmed Hassan, Michael Spear. 351-352 [doi]
- DeepSwapper: A Deep Learning Based Page Swap Management Scheme for Hybrid Memory SystemsMajed Valad Beigi, Bahareh Pourshirazi, Gokhan Memik, Zhichun Zhu. 353-354 [doi]
- VP Float: First Class Treatment for Variable Precision Floating Point ArithmeticTiago T. Jost, Yves Durand, Christian Fabre, Albert Cohen 0001, Frédéric Pétrot. 355-356 [doi]
- Approximate Pattern Matching for On-Chip Interconnect Traffic PredictionVignesh Adhinarayanan, Wu-chun Feng. 357-358 [doi]
- Compiling Chapel: Keys to Making Parallel Programming Productive at ScaleBradford L. Chamberlain. 359 [doi]
- The Forward Slice Core MicroarchitectureKartik Lakshminarasimhan, Ajeya Naithani, Josué Feliu, Lieven Eeckhout. 361-372 [doi]
- A Methodology for Principled Approximation in Visual SLAMYan Pei, Swarnendu Biswas, Donald S. Fussell, Keshav Pingali. 373-386 [doi]
- Memory-Equipped Quantum Architectures: The Power of Random AccessJonathan M. Baker, David I. Schuster, Frederic T. Chong. 387-398 [doi]
- Mixed-Signal Charge-Domain Acceleration of Deep Neural Networks through Interleaved Bit-Partitioned ArithmeticSoroush Ghodrati, Hardik Sharma, Sean Kinzer, Amir Yazdanbakhsh, Jongse Park, Nam Sung Kim, Doug Burger, Hadi Esmaeilzadeh. 399-411 [doi]
- MEPHESTO: Modeling Energy-Performance in Heterogeneous SoCs and Their Trade-OffsMohammad Alaul Haque Monil, Mehmet E. Belviranli, Seyong Lee, Jeffrey S. Vetter, Allen D. Malony. 413-425 [doi]
- Ribbon: High Performance Cache Line Flushing for Persistent MemoryKai Wu, Ivy Bo Peng, Jie Ren 0015, Dong Li. 427-439 [doi]
- PRISM: Architectural Support for Variable-granularity Memory MetadataRachata Ausavarungnirun, Timothy Merrifield, Jayneel Gandhi, Christopher J. Rossbach. 441-454 [doi]
- Valkyrie: Leveraging Inter-TLB Locality to Enhance GPU PerformanceTrinayan Baruah, Yifan Sun, Saiful A. Mojumder, José L. Abellán, Yash Ukidave, Ajay Joshi, Norman Rubin, John Kim, David R. Kaeli. 455-466 [doi]
- RackMem: A Tailored Caching Layer for Rack Scale ComputingChangyeon Jo, Hyunik Kim, Hexiang Geng, Bernhard Egger. 467-480 [doi]
- ATTC (@C): Addressable-TLB based Translation CoherenceHarsh Gugale, Nagendra Gulur, Yashwant Marathe, Lizy K. John. 481-492 [doi]