Abstract is missing.
- Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load PredictionRahul Bera, Konstantinos Kanellopoulos, Shankar Balachandran, David Novo, Ataberk Olgun, Mohammad Sadrosadati, Onur Mutlu. 1-18 [doi]
- Whisper: Profile-Guided Branch Misprediction Elimination for Data Center ApplicationsTanvir Ahmed Khan, Muhammed Ugur, Krishnendra Nathella, Dam Sunwoo, Heiner Litz, Daniel A. Jiménez, Baris Kasikci. 19-34 [doi]
- OverGen: Improving FPGA Usability through Domain-specific Overlay GenerationSihao Liu, Jian Weng 0002, Dylan Kupsh, Atefeh Sohrabizadeh, Zhengrong Wang, Licheng Guo, Jiuyang Liu, Maxim Zhulin, Rishabh Mani, Lucheng Zhang, Jason Cong, Tony Nowatzki. 35-56 [doi]
- Cambricon-P: A Bitflow Architecture for Arbitrary Precision ComputingYifan Hao, Yongwei Zhao, Chenxiao Liu, Zidong Du, Shuyao Cheng, Xiaqing Li, Xing Hu 0001, Qi Guo 0001, Zhiwei Xu 0002, Tianshi Chen 0002. 57-72 [doi]
- Revisiting Residue Codes for Modern MemoriesEvgeny Manzhosov, Adam Hastings, Meghna Pancholi, Ryan Piersma, Mohamed Tarek Ibn Ziad, Simha Sethumadhavan. 73-90 [doi]
- PageORAM: An Efficient DRAM Page Aware ORAM StrategyRachit Rajat, Yongqin Wang, Murali Annavaram. 91-107 [doi]
- AQUA: Scalable Rowhammer Mitigation by Quarantining Aggressor Rows at RuntimeAnish Saxena, Gururaj Saileshwar, Prashant J. Nair, Moinuddin K. Qureshi. 108-123 [doi]
- CRONUS: Fault-isolated, Secure and High-performance Heterogeneous Computing for Trusted Execution EnvironmentJianyu Jiang, Ji Qi, Tianxiang Shen, Xusheng Chen, Shixiong Zhao, Sen Wang, Li Chen, Gong Zhang, Xiapu Luo, Heming Cui. 124-143 [doi]
- Reconstructing Out-of-Order Issue QueueIpoom Jeong, Jiwon Lee, Myung Kuk Yoon, Won Woo Ro. 144-161 [doi]
- Speculative Code Compaction: Eliminating Dead Code via Speculative Microcode TransformationsLogan Moody, Wei Qi, Abdolrasoul Sharifi, Layne Berry, Joey Rudek, Jayesh Gaur, Jeff Parkhurst, Sreenivas Subramoney, Kevin Skadron, Ashish Venkat. 162-180 [doi]
- big.VLITTLE: On-Demand Data-Parallel Acceleration for Mobile Systems on ChipTuan Ta, Khalid Al-Hawaj, Nick Cebry, Yanghui Ou, Eric Hall, Courtney Golden, Christopher Batten. 181-198 [doi]
- Exploring Instruction Fusion Opportunities in General Purpose ProcessorsSawan Singh, Arthur Perais, Alexandra Jimborean, Alberto Ros. 199-212 [doi]
- DTexL: Decoupled Raster Pipeline for Texture LocalityDiya Joseph, Juan L. Aragón, Joan-Manuel Parcerisa, Antonio González 0001. 213-227 [doi]
- Morpheus: Extending the Last Level Cache Capacity in GPU Systems Using Idle GPU Core ResourcesSina Darabi, Mohammad Sadrosadati, Negar Akbarzadeh, Joël Lindegger, Mohammad Hosseini, Jisung Park 0001, Juan Gómez-Luna, Onur Mutlu, Hamid Sarbazi-Azad. 228-244 [doi]
- Featherweight Soft Error Resilience for GPUsYida Zhang, Changhee Jung. 245-262 [doi]
- Vulkan-Sim: A GPU Architecture Simulator for Ray TracingMohammadreza Saed, Yuan-Hsi Chou, Lufei Liu, Tyler Nowicki, Tor M. Aamodt. 263-281 [doi]
- Pushing Point Cloud Compression to the EdgeZiyu Ying 0001, Shulin Zhao 0001, Sandeepa Bhuyan, Cyan Subhra Mishra, Mahmut T. Kandemir, Chita R. Das. 282-299 [doi]
- Automatic Domain-Specific SoC Design for Autonomous Unmanned Aerial VehiclesSrivatsan Krishnan, Zishen Wan, Kshitij Bhardwaj, Paul N. Whatmough, Aleksandra Faust, Sabrina M. Neuman, Gu-Yeon Wei, David Brooks 0001, Vijay Janapa Reddi. 300-317 [doi]
- An Architectural Charge Management Interface for Energy-Harvesting SystemsEmily Ruppel, Milijana Surbatovich, Harsh Desai, Kiwan Maeng, Brandon Lucia. 318-335 [doi]
- ROG: A High Performance and Robust Distributed Training System for Robotic IoTXiuxian Guan, Zekai Sun, Shengliang Deng, Xusheng Chen, Shixiong Zhao, Zongyuan Zhang, Tianyang Duan, Yuexuan Wang, Chenshu Wu, Yong Cui, Libo Zhang 0001, Yanjun Wu, Rui Wang 0007, Heming Cui. 336-353 [doi]
- ASSASIN: Architecture Support for Stream Computing to Accelerate Computational StorageChen Zou 0001, Andrew A. Chien. 354-368 [doi]
- DaxVM: Stressing the Limits of Memory as a File InterfaceChloe Alverti, Vasileios Karakostas, Nikhita Kunati, Georgios Goumas, Michael Swift. 369-387 [doi]
- Networked SSD: Flash Memory Interconnection Network for High-Bandwidth SSDJiho Kim, Seokwon Kang, Yongjun Park 0001, John Kim. 388-403 [doi]
- Designing Virtual Memory System of MCM GPUsB. Pratheek, Neha Jawalkar, Arkaprava Basu. 404-422 [doi]
- ALTOCUMULUS: Scalable Scheduling for Nanosecond-Scale Remote Procedure CallsJiechen Zhao, Iris Uwizeyimana, Karthik Ganesan, Mark C. Jeffrey, Natalie Enright Jerger. 423-440 [doi]
- SIMR: Single Instruction Multiple Request Processing for Energy-Efficient Data Center MicroservicesMahmoud Khairy, Ahmad Alawneh, Aaron Barnes, Timothy G. Rogers. 441-463 [doi]
- Patching up Network Data Leaks with SweeperMarina Vemmou, Albert Cho, Alexandros Daglis. 464-479 [doi]
- IDIO: Network-Driven, Inbound Network Data Orchestration on Server ProcessorsMohammad Alian, Siddharth Agarwal, Jongmin Shin, Neel Patel, Yifan Yuan, Daehoon Kim, Ren Wang 0001, Nam Sung Kim. 480-493 [doi]
- Treebeard: An Optimizing Compiler for Decision Tree Based ML InferenceAshwin Prasad, Sampath Rajendra, Kaushik Rajan, R. Govindarajan, Uday Bondhugula. 494-511 [doi]
- 2: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPsWei Niu, Jiexiong Guan, Xipeng Shen, Yanzhi Wang, Gagan Agrawal, Bin Ren. 512-529 [doi]
- OCOLOS: Online COde Layout OptimizationSYuxuan Zhang, Tanvir Ahmed Khan, Gilles Pokam, Baris Kasikci, Heiner Litz, Joseph Devietti. 530-545 [doi]
- A programmable, energy-minimal dataflow compiler and architectureGraham Gobieski, Souradip Ghosh, Marijn Heule, Todd Mowry, Tony Nowatzki, Nathan Beckmann, Brandon Lucia. 546-564 [doi]
- Skipper: Enabling efficient SNN training through activation-checkpointing and time-skippingSonali Singh, Anup Sarma, Sen Lu, Abhronil Sengupta, Mahmut T. Kandemir, Emre Neftci, Vijaykrishnan Narayanan, Chita R. Das. 565-581 [doi]
- Going Further With Winograd Convolutions: Tap-Wise Quantization for Efficient Inference on 4x4 TilesRenzo Andri, Beatrice Bussolino, Antonio Cipolletta, Lukas Cavigelli, Zhe Wang 0023. 582-598 [doi]
- Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-designHongxiang Fan, Thomas Chau 0001, Stylianos I. Venieris, Royson Lee, Alexandros Kouris, Wayne Luk, Nicholas D. Lane, Mohamed S. Abdelfattah. 599-615 [doi]
- DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text GenerationSeongmin Hong, Seungjae Moon, Junsoo Kim, Sungjae Lee, Minsub Kim, Dongsoo Lee, Joo-Young Kim 0001. 616-630 [doi]
- HARMONY: Heterogeneity-Aware Hierarchical Management for Federated Learning SystemChunlin Tian, Li Li, Zhan Shi, Jun Wang, Cheng-Zhong Xu 0001. 631-645 [doi]
- Leaky Way: A Conflict-Based Cache Covert Channel Bypassing Set AssociativityYanan Guo, Xin Xin, Youtao Zhang, Jun Yang 0002. 646-661 [doi]
- SwiftDir: Secure Cache Coherence without OverprotectionChenlu Miao, Kai Bu, Mengming Li, Shaowu Mao, Jianwei Jia. 662-677 [doi]
- Self-Reinforcing Memoization for Cryptography Calculations in Secure Memory SystemsXin Wang, Daulet Talapkaliyev, Matthew Hicks, Xun Jian. 678-692 [doi]
- Eager Memory Cryptography in CachesXin Wang, Jagadish B. Kotra, Xun Jian. 693-709 [doi]
- GenPIP: In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read MappingHaiyu Mao, Mohammed Alser, Mohammad Sadrosadati, Can Firtina, Akanksha Baranwal, Damla Senol Cali, Aditya Manglik, Nour Almadhoun Alserr, Onur Mutlu. 710-726 [doi]
- BEACON: Scalable Near-Data-Processing Accelerators for Genome Analysis near Memory Pool with the CXL SupportWenqin Huangfu, Krishna T. Malladi, Andrew Chang, Yuan Xie 0001. 727-743 [doi]
- Sparse Attention Acceleration with Synergistic In-Memory Pruning and On-Chip RecomputationAmir Yazdanbakhsh, Ashkan Moradifirouzabadi, Zheng Li, Mingu Kang. 744-762 [doi]
- ICE: An Intelligent Cognition Engine with 3D NAND-based In-Memory Computing for Vector Similarity Search AccelerationHan-Wen Hu, Wei-Chen Wang, Yuan-Hao Chang 0001, Yung-Chun Lee, Bo-Rong Lin, Huai-Mu Wang, Yen-Po Lin, Yu-Ming Huang, Chong-Ying Lee, Tzu-Hsiang Su, Chih-Chang Hsieh, Chia-Ming Hu, Yi-Ting Lai, Chung Kuang Chen, Han-Sung Chen, Hsiang-Pang Li, Tei-Wei Kuo, Meng-Fan Chang, Keh-Chung Wang, Chun-Hsiung Hung, Chih-Yuan Lu. 763-783 [doi]
- CORUSCANT: Fast Efficient Processing-in-Racetrack MemoriesSébastien Ollivier, Stephen Longofono, Prayash Dutta, Jingtong Hu, Sanjukta Bhanja, Alex K. Jones. 784-798 [doi]
- IDLD: Instantaneous Detection of Leakage and Duplication of Identifiers used for Register RenamingYiannakis Sazeides, Alex Gerber, Ron Gabor, Arkady Bramnik, George Papadimitriou 0001, Dimitris Gizopoulos, Chrysostomos Nicopoulos, Giorgos Dimitrakopoulos, Karyofyllis Patsidis. 799-814 [doi]
- HiRA: Hidden Row Activation for Reducing Refresh Latency of Off-the-Shelf DRAM ChipsAbdullah Giray Yaglikçi, Ataberk Olgun, Minesh Patel, Haocong Luo, Hasan Hassan, Lois Orosa 0001, Oguz Ergin, Onur Mutlu. 815-834 [doi]
- AgileWatts: An Energy-Efficient CPU Core Idle-State Architecture for Latency-Sensitive Server ApplicationsJawad Haj-Yahya, Haris Volos 0001, Davide B. Bartolini, Georgia Antoniou, Jeremie S. Kim, Zhe Wang 0023, Kleovoulos Kalaitzidis, Tom Rollet, Zhirui Chen, Ye Geng, Onur Mutlu, Yiannakis Sazeides. 835-850 [doi]
- AgilePkgC: An Agile System Idle State Architecture for Energy Proportional Datacenter ServersGeorgia Antoniou, Haris Volos 0001, Davide B. Bartolini, Tom Rollet, Yiannakis Sazeides, Jawad Haj-Yahya. 851-867 [doi]
- Realizing Emotional Interactions to Learn User Experience and Guide Energy Optimization for Mobile ArchitecturesXueliang Li 0002, Zhuobin Shi, Junyang Chen, Yepang Liu 0001. 868-884 [doi]
- FracDRAM: Fractional Values in Off-the-Shelf DRAMFei Gao 0016, Georgios Tziantzioulis, David Wentzlaff. 885-899 [doi]
- pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup TablesJoão Dinis Ferreira, Gabriel Falcão 0001, Juan Gómez-Luna, Mohammed Alser, Lois Orosa 0001, Mohammad Sadrosadati, Jeremie S. Kim, Geraldo F. Oliveira, Taha Shahroodi, Anant Nori, Onur Mutlu. 900-919 [doi]
- Multi-Layer In-Memory ProcessingDaichi Fujiki, Alireza Khadem, Scott A. Mahlke, Reetuparna Das. 920-936 [doi]
- Flash-Cosmos: In-Flash Bulk Bitwise Operations Using Inherent Computation Capability of NAND Flash MemoryJisung Park 0001, Roknoddin Azizi, Geraldo F. Oliveira, Mohammad Sadrosadati, Rakesh Nadig, David Novo, Juan Gómez-Luna, Myungsuk Kim, Onur Mutlu. 937-955 [doi]
- Page Size Aware Cache PrefetchingGeorgios Vavouliotis, Gino Chacon, Lluc Alvarez, Paul V. Gratz, Daniel A. Jiménez, Marc Casas. 956-974 [doi]
- Berti: an Accurate Local-Delta Data PrefetcherAgustín Navarro-Torres, Biswabandan Panda, Jesús Alastruey-Benedé, Pablo Ibáñez, Víctor Viñals Yúfera, Alberto Ros. 975-991 [doi]
- Translation-optimized Memory Compression for CapacityGagandeep Panwar, Muhammad Laghari, David Bears, Yuqing Liu, Chandler Jearls, Esha Choukse, Kirk W. Cameron, Ali Raza Butt, Xun Jian. 992-1011 [doi]
- Merging Similar Patterns for Hardware PrefetchingShizhi Jiang, Qiusong Yang, Yiwei Ci. 1012-1026 [doi]
- AutoComm: A Framework for Enabling Efficient Communication in Distributed Quantum ProgramsAnbang Wu, Hezi Zhang, Gushu Li, Alireza Shabani, Yuan Xie 0001, Yufei Ding. 1027-1041 [doi]
- Let Each Quantum Bit Choose Its Basis GatesSophia Fuhui Lin, Sara Sussman, Casey Duckering, Pranav S. Mundada, Jonathan M. Baker, Rohan S. Kumar, Andrew A. Houck, Frederic T. Chong. 1042-1058 [doi]
- COMPAQT: Compressed Waveform Memory Architecture for Scalable Qubit ControlSatvik Maurya, Swamit Tannu. 1059-1077 [doi]
- Qubit Mapping and Routing via MaxSATAbtin Molavi, Amanda Xu, Martin Diges, Lauren Pick, Swamit Tannu, Aws Albarghouthi. 1078-1091 [doi]
- Scaling Superconducting Quantum Computers with Chiplet ArchitecturesKaitlin N. Smith, Gokul Subramanian Ravi, Jonathan M. Baker, Frederic T. Chong. 1092-1109 [doi]
- Q3DE: A fault-tolerant quantum computer architecture for multi-bit burst errors by cosmic raysYasunari Suzuki, Takanori Sugiyama, Tomochika Arai, Wang Liao, Koji Inoue, Teruo Tanimoto. 1110-1125 [doi]
- RemembERR: Leveraging Microprocessor Errata for Design Testing and ValidationFlavien Solt, Patrick Jattke, Kaveh Razavi. 1126-1143 [doi]
- Datamime: Generating Representative Benchmarks by Automatically Synthesizing DatasetsHyun Ryong Lee, Daniel Sánchez 0003. 1144-1159 [doi]
- An architecture interface and offload model for low-overhead, near-data, distributed acceleratorsSaambhavi Baskaran, Mahmut Taylan Kandemir, Jack Sampson. 1160-1177 [doi]
- Towards Developing High Performance RISC-V Processors Using Agile MethodologyYinan Xu, Zihao Yu, Dan Tang, Guokai Chen, Lu Chen, Lingrui Gou, Yue Jin, Qianruo Li, Xin Li, Zuojun Li, Jiawei Lin, Tong Liu, Zhigang Liu, Jiazhan Tan, Huaqiang Wang, Huizhe Wang, Kaifan Wang, Chuanqi Zhang, Fawang Zhang, Linjuan Zhang, Zifei Zhang, Yangyang Zhao, Yaoyang Zhou, Yike Zhou, Jiangrui Zou, Ye Cai, Dandan Huan, Zusong Li, Jiye Zhao, Zihao Chen, Wei He, Qiyuan Quan, Xingwu Liu, Sa Wang, Kan Shi, Ninghui Sun, Yungang Bao. 1178-1199 [doi]
- DiVa: An Accelerator for Differentially Private Machine LearningBeomsik Park, Ranggi Hwang, Dongho Yoon, Yoonhyuk Choi, Minsoo Rhu. 1200-1217 [doi]
- EVAX: Towards a Practical, Pro-active & Adaptive Architecture for High Performance & SecuritySamira Mirbagher Ajorpaz, Daniel Moghimi, Jeffrey Neal Collins, Gilles Pokam, Nael B. Abu-Ghazaleh, Dean M. Tullsen. 1218-1236 [doi]
- ARK: Fully Homomorphic Encryption Accelerator with Runtime Data Generation and Inter-Operation Key ReuseJongmin Kim, Gwangho Lee, Sangpyo Kim, Gina Sohn, Minsoo Rhu, John Kim, Jung Ho Ahn. 1237-1254 [doi]
- Horus: Persistent Security for Extended Persistence-Domain Memory SystemsXijing Han, James Tuck, Amro Awad. 1255-1269 [doi]
- Mint: An Accelerator For Mining Temporal MotifsNishil Talati, Haojie Ye, Sanketh Vedula, Kuan-Yu Chen, Yuhan Chen, Daniel Liu, Yichao Yuan, David T. Blaauw, Alex Bronstein, Trevor N. Mudge, Ronald G. Dreslinski. 1270-1287 [doi]
- DPU-v2: Energy-efficient execution of irregular directed acyclic graphsNimish Shah, Wannes Meert, Marian Verhelst. 1288-1307 [doi]
- XPGraph: XPline-Friendly Persistent Memory Graph Stores for Large-Scale Evolving GraphsRui Wang, Shuibing He, Weixu Zong, Yongkun Li, Yinlong Xu. 1308-1325 [doi]
- A Data-Centric Accelerator for High-Performance Hypergraph ProcessingQinggang Wang, Long Zheng 0003, Ao Hu, Yu Huang 0013, Pengcheng Yao, Chuangyi Gui, Xiaofei Liao, Hai Jin 0001, Jingling Xue. 1326-1341 [doi]
- ReGraph: Scaling Graph Processing on HBM-enabled FPGAs with Heterogeneous PipelinesXinyu Chen, Yao Chen, Feng Cheng, Hongshi Tan, Bingsheng He, Weng-Fai Wong. 1342-1358 [doi]
- 3D-FPIM: An Extreme Energy-Efficient DNN Acceleration System Using 3D NAND Flash-Based In-Situ PIM UnitHunjun Lee, Minseop Kim, Dongmoon Min, Joonsung Kim, Jongwon Back, Honam Yoo, Jong-Ho Lee, Jangwoo Kim. 1359-1376 [doi]
- Sparseloop: An Analytical Approach To Sparse Tensor Accelerator ModelingYannan Nellie Wu, Po-An Tsai, Angshuman Parashar, Vivienne Sze, Joel S. Emer. 1377-1395 [doi]
- DeepBurning-SEG: Generating DNN Accelerators of Segment-Grained Pipeline ArchitectureXuyi Cai, Ying Wang 0001, Xiaohan Ma, Yinhe Han, Lei Zhang 0008. 1396-1413 [doi]
- ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network QuantizationCong Guo 0003, Chen Zhang 0001, Jingwen Leng, Zihan Liu, Fan Yang 0024, Yunxin Liu, Minyi Guo, Yuhao Zhu 0001. 1414-1433 [doi]
- Ristretto: An Atomized Processing Architecture for Sparsity-Condensed Stream Flow in CNNGang Li, Weixiang Xu, Zhuoran Song, Naifeng Jing, Jian Cheng, Xiaoyao Liang. 1434-1450 [doi]