Abstract is missing.
- SGCN: Exploiting Compressed-Sparse Features in Deep Graph Convolutional Network AcceleratorsMingi Yoo, Jaeyong Song, Jounghoo Lee, Namhyung Kim, Youngsok Kim, Jinho Lee. 1-14  [doi]
- PhotoFourier: A Photonic Joint Transform Correlator-Based Neural Network AcceleratorShurui Li, Hangbo Yang, Chee Wei Wong, Volker J. Sorger, Puneet Gupta. 15-28  [doi]
- INCA: Input-stationary Dataflow at Outside-the-box Thinking about Deep Learning AcceleratorsBokyung Kim, Shiyu Li, Hai Li 0001. 29-41  [doi]
- GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural NetworksRanggi Hwang, Minhoo Kang, Jiwon Lee, Dongyun Kam, Youngjoo Lee, Minsoo Rhu. 42-55  [doi]
- Logical/Physical Topology-Aware Collective Communication in Deep Learning TrainingJo Sanghoon, Hyojun Son, John Kim. 56-68  [doi]
- Sibia: Signed Bit-slice Architecture for Dense DNN Acceleration with Slice-level Sparsity ExploitationDongseok Im, Gwangtae Park, Zhiyong Li, Junha Ryu, Hoi-Jun Yoo. 69-80  [doi]
- AstriFlash A Flash-Based System for Online ServicesSiddharth Gupta 0003, Yunho Oh, Lei Yan, Mark Sutherland, Abhishek Bhattacharjee, Babak Falsafi, Peter Hsu. 81-93  [doi]
- Thoth: Bridging the Gap Between Persistently Secure Memories and Memory Interfaces of Emerging NVMsXijing Han, James Tuck, Amro Awad. 94-107  [doi]
- Multi-Granularity Shadow Paging with NVM Write Optimization for Crash-Consistent Memory-Mapped I/OHongchao Du, Qiao Li 0001, Riwei Pan, Tei-Wei Kuo, Chun Jason Xue. 108-121  [doi]
- MGC: Multiple-Gray-Code for 3D NAND Flash based High-Density SSDsYina Lv, Liang Shi, Qiao Li 0001, Congming Gao, Yunpeng Song, Longfei Luo, Youtao Zhang. 122-136  [doi]
- Baryon: Efficient Hybrid Memory Management with Compression and Sub-BlockingYiwei Li, Mingyu Gao. 137-151  [doi]
- Root Crash Consistency of SGX-style Integrity Trees in Secure Non-Volatile Memory SystemsJianming Huang 0001, Yu Hua. 152-164  [doi]
- ACIC: Admission-Controlled Instruction CacheYunjin Wang, Chia-Hao Chang, Anand Sivasubramaniam, Niranjan Soundararajan. 165-178  [doi]
- Compression-Aware and Performance-Efficient Insertion Policies for Long-Lasting Hybrid LLCsCarlos Escuin, Asif Ali Khan, Pablo Ibáñez, Teresa Monreal, Jerónimo Castrillón, Víctor Viñals. 179-192  [doi]
- NOMAD: Enabling Non-blocking OS-managed DRAM Cache via Tag-Data DecouplingYoungin Kim, Hyeonjin Kim, William J. Song. 193-205  [doi]
- Safety Hints for HTM Capacity Abort MitigationAnirudh Jain, Divya Kiran Kadiyala, Alexandros Daglis. 206-219  [doi]
- iCache: An Importance-Sampling-Informed Cache for Accelerating I/O-Bound DNN Model TrainingWeijian Chen 0002, Shuibing He, Yaowen Xu, Xuechen Zhang, Siling Yang, Shuang Hu, Xian-He Sun, Gang Chen. 220-232  [doi]
- Are Randomized Caches Truly Random? Formal Analysis of Randomized-Partitioned CachesAnirban Chakraborty, Sarani Bhattacharya, Sayandeep Saha, Debdeep Mukhopadhyay. 233-246  [doi]
- HIRAC: A Hierarchical Accelerator with Sorting-based Packing for SpGEMMs in DNN ApplicationsHesam Shabani, Abhishek Singh, Bishoy Youhana, Xiaochen Guo. 247-258  [doi]
- VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUsGeonhwa Jeong, Sana Damani, Abhimanyu Rajeshkumar Bambhaniya, Eric Qin 0001, Christopher J. Hughes, Sreenivas Subramoney, Hyesoon Kim, Tushar Krishna. 259-272  [doi]
- ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-DesignHaoran You, Zhanyi Sun, Huihong Shi, Zhongzhi Yu, Yang Zhao 0013, Yongan Zhang, Chaojian Li, Baopu Li, Yingyan Lin. 273-286  [doi]
- Leveraging Domain Information for the Efficient Automated Design of Deep Learning AcceleratorsChirag Sakhuja, Zhan Shi, Calvin Lin. 287-301  [doi]
- DIMM-Link: Enabling Efficient Inter-DIMM Communication for Near-Memory ProcessingZhe Zhou, Cong Li, Fan Yang, Guangyu Sun 0003. 302-316  [doi]
- AutoCAT: Reinforcement Learning for Automated Exploration of Cache-Timing AttacksMulong Luo, Wenjie Xiong 0001, Geunbae Lee, Yueying Li, Xiaomeng Yang, Amy Zhang, Yuandong Tian, Hsien-Hsin S. Lee, G. Edward Suh. 317-332  [doi]
- SHADOW: Preventing Row Hammer in DRAM with Intra-Subarray Row ShufflingMinbok Wi, Jaehyun Park 0006, Seoyoung Ko, Michael Jaemin Kim, Nam Sung Kim, Eojin Lee, Jung Ho Ahn. 333-346  [doi]
- Efficient Distributed Secure Memory with Migratable Merkle TreeErhu Feng, Dong Du 0003, Yubin Xia, Haibo Chen 0001. 347-360  [doi]
- AB-ORAM: Constructing Adjustable Buckets for Space Reduction in Ring ORAMMehrnoosh Raoufi, Jun Yang 0002, Xulong Tang, Youtao Zhang. 361-373  [doi]
- Scalable and Secure Row-Swap: Efficient and Safe Row Hammer Mitigation in Memory SystemsJeonghyun Woo, Gururaj Saileshwar, Prashant J. Nair. 374-389  [doi]
- Post0-VR: Enabling Universal Realistic Rendering for Modern VR via Exploiting Architectural Similarity and Data SharingYu Wen, Chenhao Xie 0001, Shuaiwen Leon Song, Xin Fu. 390-402  [doi]
- ParallelNN: A Parallel Octree-based Nearest Neighbor Search Accelerator for 3D Point CloudsFaquan Chen, Rendong Ying, Jianwei Xue, Fei Wen, Peilin Liu. 403-414  [doi]
- ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor AttentionJyotikrishna Dass, Shang Wu, Huihong Shi, Chaojian Li, Zhifan Ye, Zhongfeng Wang, Yingyan Lin. 415-428  [doi]
- CTA: Hardware-Software Co-design for Compressed Token Attention MechanismHaoran Wang, Haobo Xu, Ying Wang 0001, Yinhe Han. 429-441  [doi]
- HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision TransformersPeiyan Dong, Mengshu Sun, Alec Lu, Yanyue Xie, Kenneth Liu, Zhenglun Kong, Xin Meng, Zhengang Li, Xue Lin, Zhenman Fang, Yanzhi Wang. 442-455  [doi]
- Trans-FW: Short Circuiting Page Table Walk in Multi-GPU Systems via Remote ForwardingBingyao Li, Jieming Yin, Anup Holey, Youtao Zhang, Jun Yang, Xulong Tang. 456-470  [doi]
- Ah-Q: Quantifying and Handling the Interference within a Datacenter from a System PerspectiveYuhang Liu, Xin Deng, Jiapeng Zhou, Mingyu Chen 0001, Yungang Bao. 471-484  [doi]
- Market Mechanism-Based User-in-the-Loop Scalable Power Oversubscription for HPC SystemsMd Rajib Hossen, Kishwar Ahmed, Mohammad A. Islam 0001. 485-498  [doi]
- Rambda: RDMA-driven Acceleration Framework for Memory-intensive µs-scale Datacenter ApplicationsYifan Yuan, Jinghan Huang, Yan Sun, Tianchen Wang, Jacob Nelson, Dan R. K. Ports, Yipeng Wang 0002, Ren Wang 0001, Charlie Tai, Nam Sung Kim. 499-515  [doi]
- FinePack: Transparently Improving the Efficiency of Fine-Grained Transfers in Multi-GPU SystemsHarini Muthukrishnan, Daniel Lustig, Oreste Villa, Thomas F. Wenisch, David W. Nellans. 516-529  [doi]
- Mitigating GPU Core Partitioning Performance EffectsAaron Barnes, Fangjia Shen, Timothy G. Rogers. 530-542  [doi]
- Plutus: Bandwidth-Efficient Memory Security for GPUsRahaf Abdullah, Huiyang Zhou, Amro Awad. 543-555  [doi]
- MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator ParallelismQuan Zhou, Haiquan Wang, Xiaoyan Yu, Cheng Li 0001, Youhui Bai, Feng Yan 0001, Yinlong Xu. 556-569  [doi]
- DeFiNES: Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators through Analytical ModelingLinyan Mei, Koen Goetschalckx, Arne Symons, Marian Verhelst. 570-583  [doi]
- CEGMA: Coordinated Elastic Graph Matching Acceleration for Graph Matching NetworksYue Dai, Youtao Zhang, Xulong Tang. 584-597  [doi]
- ISOSceles: Accelerating Sparse CNNs through Inter-Layer PipeliningYifan Yang, Joel S. Emer, Daniel Sánchez 0003. 598-610  [doi]
- OptimStore: In-Storage Optimization of Large Scale DNNs with On-Die ProcessingJunkyum Kim, Myeonggu Kang, Yunki Han, Yanggon Kim, Lee-Sup Kim. 611-623  [doi]
- KRISP: Enabling Kernel-wise RIght-sizing for Spatial Partitioned GPU Inference ServersMarcus Chow, Ali Jahanshahi, Daniel Wong 0001. 624-637  [doi]
- MERCURY: Accelerating DNN Training By Exploiting Input SimilarityVahid Janfaza, Kevin Weston, Moein Razavi, Shantanu Mandal, Farabi Mahmud, Alex Hilty, Abdullah Muzahid. 638-650  [doi]
- Silo: Speculative Hardware Logging for Atomic Durability in Persistent MemoryMing Zhang, Yu Hua. 651-663  [doi]
- Reconciling Selective Logging and Hardware Persistent Memory TransactionChencheng Ye, Yuanchao Xu 0001, Xipeng Shen, Yan Sha, Xiaofei Liao, Hai Jin 0001, Yan Solihin. 664-676  [doi]
- SecPB: Architectures for Secure Non-Volatile Memory with Battery-Backed Persist BuffersAlexander Freij, Huiyang Zhou, Yan Solihin. 677-690  [doi]
- EVE: Ephemeral Vector EnginesKhalid Al-Hawaj, Tuan Ta, Nick Cebry, Shady Agwa, Olalekan Afuye, Eric Hall, Courtney Golden, Alyssa B. Apsel, Christopher Batten. 691-704  [doi]
- On Consistency for Bulk-Bitwise Processing-in-MemoryBen Perach, Ronny Ronen, Shahar Kvatinsky. 705-717  [doi]
- Dalorex: A Data-Local Program Execution and Architecture for Memory-bound ApplicationsMarcelo Orenes-Vera, Esin Tureci, David Wentzlaff, Margaret Martonosi. 718-730  [doi]
- HyQSAT: A Hybrid Approach for 3-SAT Problems by Integrating Quantum Annealer with CDCLSiwei Tan, Mingqian Yu, Andre Python, Yongheng Shang, Tingting Li, Liqiang Lu, Jianwei Yin. 731-744  [doi]
- Duet: Creating Harmony between Processors and Embedded FPGAsAng Li, August Ning, David Wentzlaff. 745-758  [doi]
- Co-Designed Architectures for Modular Superconducting Quantum ComputersEvan McKinney, Mingkang Xia, Chao Zhou, Pinlei Lu, Michael Hatridge, Alex K. Jones. 759-772  [doi]
- A Pulse Generation Framework with Augmented Program-aware Basis Gates and Criticality AnalysisYan Hao Chen, Yuwei Jin, Fei Hua, Ari B. Hayes, Ang Li 0006, Yunong Shi, Eddy Z. Zhang. 773-786  [doi]
- The Imitation Game: Leveraging CopyCats for Robust Native Gate Selection in NISQ ProgramsPoulami Das 0005, Eric Kessler, Yunong Shi. 787-801  [doi]
- eNODE: Energy-Efficient and Low-Latency Edge Inference and Training of Neural ODEsJunkang Zhu, Yaoyu Tao, Zhengya Zhang. 802-813  [doi]
- SpecFaaS: Accelerating Serverless Applications with Speculative Function ExecutionJovan Stojkovic, Tianyin Xu, Hubertus Franke, Josep Torrellas. 814-827  [doi]
- MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural NetworksSeah Kim, Hasan Genc, Vadim Vadimovich Nikiforov, Krste Asanovic, Borivoje Nikolic, Yakun Sophia Shao. 828-841  [doi]
- Know Your Enemy To Save Cloud Energy: Energy-Performance Characterization of Machine Learning ServingJunyeol Yu, Jongseok Kim, Euiseong Seo. 842-854  [doi]
- Adrias: Interference-Aware Memory Orchestration for Disaggregated Cloud InfrastructuresDimosthenis Masouros, Christian Pinto, Michele Gazzetti, Sotirios Xydis, Dimitrios Soudris. 855-869  [doi]
- Poseidon: Practical Homomorphic Encryption AcceleratorYinghao Yang, Huaizhi Zhang, Shengyu Fan, Hang Lu, Mingzhe Zhang, Xiaowei Li 0001. 870-881  [doi]
- FAB: An FPGA-based Accelerator for Bootstrappable Fully Homomorphic EncryptionRashmi Agrawal, Leo de Castro, Guowei Yang, Chiraag Juvekar, Rabia Tugce Yazicigil, Anantha P. Chandrakasan, Vinod Vaikuntanathan, Ajay Joshi. 882-895  [doi]
- FxHENN: FPGA-based acceleration framework for homomorphic encrypted CNN inferenceYilan Zhu, Xinyao Wang, Lei Ju, Shanqing Guo. 896-907  [doi]
- D-Shield: Enabling Processor-side Encryption and Integrity Verification for Secure NVMe DrivesMd Hafizul Islam Chowdhuryy, Myoungsoo Jung, Fan Yao, Amro Awad. 908-921  [doi]
- TensorFHE: Achieving Practical Computation on Encrypted Data Using GPGPUShengyu Fan, Zhiwei Wang, Weizhi Xu, Rui Hou 0001, Dan Meng, Mingzhe Zhang. 922-934  [doi]
- AVGI: Microarchitecture-Driven, Fast and Accurate Vulnerability AssessmentGeorge Papadimitriou 0001, Dimitris Gizopoulos. 935-948  [doi]
- Realizing Extreme Endurance Through Fault-aware Wear Leveling and Improved ToleranceJiangwei Zhang, Chong Wang, Zhenhua Zhu, Donald Kline, Alex K. Jones, Huazhong Yang, Yu Wang 0002. 964-976  [doi]
- ESD: An ECC-assisted and Selective Deduplication for Encrypted Non-Volatile Main MemoryChunfeng Du, Suzhen Wu, Jiapeng Wu, Bo Mao, Shengzhe Wang. 977-990  [doi]
- A Systematic Study of DDR4 DRAM Faults in the FieldMajed Valad Beigi, Yi Cao, Sudhanva Gurumurthi, Charles Recchia, Andrew Walton, Vilas Sridharan. 991-1002  [doi]
- High Performance and Power Efficient Accelerator for Cloud InferenceJianguo Yao, Hao Zhou, Yalin Zhang, Ying Li, Chuang Feng, Shi Chen, Jiaoyan Chen, Yongdong Wang, Qiaojuan Hu. 1003-1016  [doi]
- LightTrader: A Standalone High-Frequency Trading System with Deep Learning Inference Accelerators and Proactive SchedulerSungyeob Yoo, Hyunsung Kim, Jinseok Kim, Sunghyun Park, Joo-Young Kim 0001, Jinwook Oh. 1017-1030  [doi]
- BM-Store: A Transparent and High-performance Local Storage Architecture for Bare-metal Clouds Enabling Large-scale DeploymentYiquan Chen, Jiexiong Xu, Chengkun Wei, Yijing Wang, Xin Yuan, Yangming Zhang, Xulin Yu, Yi Chen, Zeke Wang, Shuibing He, Wenzhi Chen. 1031-1044  [doi]
- Turbo: SmartNIC-enabled Dynamic Load Balancing of µs-scale RPCsHamed Seyedroudbari, Srikar Vanavasam, Alexandros Daglis. 1045-1058  [doi]
- A Scalable Methodology for Designing Efficient Interconnection Network of ChipletsYinxiao Feng, Dong Xiang, Kaisheng Ma. 1059-1071  [doi]
- VVQ: Virtualizing Virtual Channel for Cost-Efficient Protocol Deadlock AvoidanceHans Kasan, John Kim. 1072-1084  [doi]
- Mix-GEMM: An efficient HW-SW Architecture for Mixed-Precision Quantized Deep Neural Networks Inference on Edge DevicesEnrico Reggiani, Alessandro Pappalardo, Max Doblas, Miquel Moretó, Mauro Olivieri, Osman Sabri Unsal, Adrián Cristal. 1085-1098  [doi]
- FlowGNN: A Dataflow Architecture for Real-Time Workload-Agnostic Graph Neural Network InferenceRishov Sarkar, Stefan Abi-Karam, Yuqi He, Lakshmi Sathidevi, Cong Hao. 1099-1112  [doi]
- Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators FusionSize Zheng 0001, Siyuan Chen, Peidi Song, Renze Chen, Xiuhong Li, Shengen Yan, Dahua Lin, Jingwen Leng, Yun Liang 0001. 1113-1126  [doi]
- Securator: A Fast and Secure Neural Processing UnitNivedita Shrivastava, Smruti Ranjan Sarangi. 1127-1139  [doi]
- Tensor Movement Orchestration in Multi-GPU Training SystemsShao-Fu Lin, Yi-Jung Chen, Hsiang-Yun Cheng, Chia-Lin Yang. 1140-1152  [doi]
- A Storage-Effective BTB Organization for ServersTruls Asheim, Boris Grot, Rakesh Kumar 0003. 1153-1167  [doi]
- HoPP: Hardware-Software Co-Designed Page Prefetching for Disaggregated MemoryHaifeng Li, Ke Liu 0004, Ting Liang, Zuojun Li, Tianyue Lu, Hui Yuan, Yinben Xia, Yungang Bao, Mingyu Chen 0001, Yizhou Shan. 1168-1181  [doi]
- Speculative Register ReclamationSanyam Mehta. 1182-1194  [doi]
- SnakeByte: A TLB Design with Adaptive and Recursive Page Merging in GPUsJiwon Lee, Ju Min Lee, Yunho Oh, William J. Song, Won Woo Ro. 1195-1207  [doi]
- CARE: A Concurrency-Aware Enhanced Lightweight Cache Management FrameworkXiaoyang Lu, Rujia Wang, Xian-He Sun. 1208-1220  [doi]
- Memory-Efficient Hashed Page TablesJovan Stojkovic, Namrata Mantri, Dimitrios Skarlatos 0002, Tianyin Xu, Josep Torrellas. 1221-1235  [doi]
- NvWa: Enhancing Sequence Alignment Accelerator Throughput via Hardware SchedulingYewen Li, Xueqi Li, Ruihao Gao, Wanqi Liu, Guangming Tan. 1236-1248  [doi]
- Efficient Supernet Training Using Path ParallelismYing Xu, Long Cheng, Xuyi Cai, Xiaohan Ma, Weiwei Chen, Lei Zhang, Ying Wang 0001. 1249-1261  [doi]
- Phloem: Automatic Acceleration of Irregular Applications with Fine-Grain Pipeline ParallelismQuan M. Nguyen, Daniel Sánchez 0003. 1262-1274  [doi]
- CHOPPER: A Compiler Infrastructure for Programmable Bit-serial SIMD Processing Using Memory in DRAMXiangjun Peng, Yaohua Wang, Ming-Chang Yang. 1275-1288  [doi]
- VAQUERO: A Scratchpad-based Vector Accelerator for Query ProcessingJulián Pavón, Iván Vargas Valdivieso, Joan Marimon, Roger Figueras, Francesc Moll, Osman S. Unsal, Mateo Valero, Adrián Cristal. 1289-1302  [doi]