Abstract is missing.
- Stratum: System-Hardware Co-Design with Tiered Monolithic 3D-Stackable DRAM for Efficient MoE ServingYue Pan 0009, Zihan Xia 0002, Po-Kai Hsu, Lanxiang Hu, Hyungyo Kim, Janak Sharda, Minxuan Zhou, Nam Sung Kim, Shimeng Yu, Tajana Rosing, Mingu Kang. 1-17 [doi]
- Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge ComputingTianhua Xia, Sai Qian Zhang. 18-33 [doi]
- LongSight: Compute-Enabled Memory to Accelerate Large-Context LLMs via Sparse AttentionDerrick Quinn, E. Ezgi Yücel, Jinkwon Kim, José F. Martínez, Mohammad Alian. 34-48 [doi]
- ComPASS: A Compatible PIM Protocol Architecture and Scheduling Solution for Processor-PIM CollaborationSeunghyuk Yu, Hyeonu Kim, Kyoungho Jeun, Sunyoung Hwang, Seongmin Cho, Eojin Lee. 49-62 [doi]
- PIM-CCA: An Efficient PIM Architecture with Optimized Integration of Configurable Functional UnitsJeehyun Kim, Donghyeon Kim 0001, Seokwon Kang, Bongjoon Hyun, Inho Lee 0002, Yongjun Park 0001. 63-77 [doi]
- 3D-PATH: A Hierarchy LUT Processing-in-memory Accelerator with Thermal-aware Hybrid Bonding IntegrationZhiheng Yue, Yang Wang 0089, Chao Li 0009, Shaojun Wei, Yang Hu 0001, Shouyi Yin. 78-93 [doi]
- One Flew over the Stack Engine's Nest: Practical Microarchitectural Attacks on the Stack EngineSilvan Niederer, Sandro Rüegge, Ali Hajiabadi, Kaveh Razavi. 94-110 [doi]
- DExiM: Exposing Impedance-Based Data Leakage in Emerging MemoriesMd. Sadik Awal, Md Tauhidur Rahman. 111-124 [doi]
- Sonar: A Hardware Fuzzing Framework to Uncover Contention Side Channels in ProcessorsKanqi Zhang, Peinan Li, Miao Li, Xin Tian, Zelong Du, Quanchen Liu, Yongqiang Lyu 0001, Yu Jiang 0001, Dan Meng 0002, Rui Hou 0001. 125-139 [doi]
- Symbiotic Task Scheduling and Data PrefetchingGilead Posluns, Mark C. Jeffrey. 140-155 [doi]
- Software Prefetch Multicast: Sharer-Exposed Prefetching for Bandwidth Efficiency in Manycore ProcessorsYanhua Chen, Jiong Feng, Zhe Wang 0023, Christopher J. Hughes, Jiayi Huang 0001. 156-169 [doi]
- RICH Prefetcher: Storing Rich Information in Memory to Trade Capacity and Bandwidth for Latency HidingNingzhi Ai, Wenjian He, Hu He, Jing Xia, Heng Liao, Guowei Zhang 0002. 170-183 [doi]
- DECA: A Near-Core LLM Decompression Accelerator Grounded on a 3D Roofline ModelGerasimos Gerogiannis, Stijn Eyerman, Evangelos Georganas, Wim Heirman, Josep Torrellas. 184-200 [doi]
- StreamTensor: Make Tensors Stream in Dataflow Accelerators for LLMsHanchen Ye, Deming Chen. 201-216 [doi]
- Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference EnvironmentsNikoleta Iliakopoulou, Jovan Stojkovic, Chloe Alverti, Tianyin Xu, Hubertus Franke, Josep Torrellas. 217-231 [doi]
- Coruscant: Co-Designing GPU Kernel and Sparse Tensor Core to Advocate Unstructured Sparsity in Efficient LLM InferenceDonghyeon Joo, Helya Hosseini, Ramyad Hadidi, Bahar Asgari. 232-245 [doi]
- Accelerating Retrieval Augmented Language Model via PIM and PNM IntegrationJe-Woo Jang, Junyong Oh, Youngbae Kong, Jae-Youn Hong, Sung-Hyuk Cho, Jeongyeol Lee, Hoeseok Yang, Joon-Sung Yang. 246-262 [doi]
- HEAT: NPU-NDP HEterogeneous Architecture for Transformer-Empowered Graph Neural NetworksRuiyang Chen, Zhuoran Song, Yicheng Zheng, Zeyu Zhu, Gang Li 0015, Naifeng Jing, Xiaoyao Liang, Haibing Guan. 263-276 [doi]
- RayN: Ray Tracing Acceleration with Near-memory ComputingMohammadreza Saed, Prashant J. Nair, Tor M. Aamodt. 277-291 [doi]
- Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model ServingWonung Kim, Yubin Lee 0002, Yoonsung Kim, Jinwoo Hwang, Seongryong Oh, Jiyong Jung, Aziz Huseynov, Woong Gyu Park, Chang Hyun Park 0001, Divya Mahajan 0001, Jongse Park. 292-307 [doi]
- GateBleed: Exploiting On-Core Accelerator Power Gating for High Performance and Stealthy Attacks on AIJoshua Kalyanapu, Farshad Dizani, Darsh Asher, Azam Ghanbari, Rosario Cammarota, Aydin Aysu, Samira Mirbagher Ajorpaz. 308-325 [doi]
- Athena: Accelerating Quantized Convolutional Neural Networks under Fully Homomorphic EncryptionYinghao Yang 0001, Xicheng Xu, Liang Chang 0002, Hang Lu, Xiaowei Li 0001. 326-339 [doi]
- ccAI: A Compatible and Confidential System for AI ComputingChenxu Wang 0005, Danqing Tang, Changxu Ci, Junjie Huang, Yankai Xu, Fengwei Zhang, Jiannong Cao 0001, Jie Song, Shoumeng Yan, Tao Wei 0002, Zhengyu He. 340-353 [doi]
- Ironman: Accelerating Oblivious Transfer Extension for Privacy-Preserving AI with Near-Memory ProcessingChenqi Lin, Kang Yang, Tianshi Xu, Ling Liang, Yufei Wang, Zhaohui Chen, Runsheng Wang, Mingyu Gao 0001, Meng Li 0004. 354-368 [doi]
- Dissecting and Modeling the Architecture of Modern GPU CoresRodrigo Huerta, Mojtaba Abaie Shoushtary, José-Lorenzo Cruz, Antonio González 0001. 369-384 [doi]
- Interleaved Bitstream Execution for Multi-Pattern Regex Matching on GPUsTianao Ge, Xiaowen Chu 0001, Hongyuan Liu 0002. 385-400 [doi]
- SoftWalker: Supporting Software Page Table Walk for Irregular GPU ApplicationsSungbin Jang, Junhyeok Park 0001, Yongho Lee, Osang Kwon, Donghyun Kim, Juyoung Seok, Seokin Hong. 401-417 [doi]
- LATPC: Accelerating GPU Address Translation Using Locality-Aware TLB Prefetching and MSHR CompressionYeonan Ha, Jiho Park, Hanna Cha, Jiwon Lee, Joonsung Kim 0001, Won Woo Ro, Youngsok Kim. 418-431 [doi]
- S-DMA: Sparse Diffusion Models Acceleration via Spatiality-Aware Prediction and Dimension-Adaptive DataflowZihan Zou, Xinming Yan, Shun Zhang, Peng Zheng, Guang Yang 0036, Hao Cai 0001, Bo Liu 0019. 432-444 [doi]
- LLM.265: Video Codecs are Secretly Tensor CodecsCeyu Xu, Yongji Wu, Xinyu Yang 0002, Beidi Chen, Matthew Lentz, Danyang Zhuo, Lisa Wu Wills. 445-460 [doi]
- HLX: A Unified Pipelined Architecture for Optimized Performance of Hybrid Transformer-Mamba Language ModelsIn Jun Jung, Gyeongrok Yang, Jaeha Min, Joo-Young Kim 0001. 461-475 [doi]
- ORCHES: Orchestrated Test-Time-Compute-based LLM Reasoning on Collaborative GPU-PIM HEterogeneous SystemSixu Li, Yuzhou Chen, Chaojian Li, Yonggan Fu, Zheng Wang, Zhongzhi Yu, Haoran You, Zhifan Ye, Wei Zhou, Yongan Zhang, Yingyan (Celine) Lin. 476-489 [doi]
- LoopFrog: In-Core Hint-Based Loop ParallelizationMárton Erdos, Utpal Bora 0001, Akshay Bhosale, Bob Lytton, Ali Mustafa Zaidi, Alexandra W. Chadwick, Yuxin Guo, Giacomo Gabrielli, Timothy M. Jones 0001. 490-503 [doi]
- Multi-Stream Squash Reuse for Control-Independent ProcessorsQingxuan Kang, Trevor E. Carlson. 504-518 [doi]
- Drishti: Do Not Forget Slicing While Designing Last-Level Cache Replacement Policies for Many-Core SystemsSweta, Prerna Priyadarshini, Biswabandan Panda. 519-532 [doi]
- A TRRIP Down Memory Lane: Temperature-Based Re-Reference Interval Prediction For Instruction CachingHenry Kao, Nikhil Sreekumar, Prabhdeep Singh Soni, Ali Sedaghati, Fang Su, Bryan Chan, Maziar Goudarzi, Reza Azimi. 533-546 [doi]
- LANCER: Low-Overhead, Accurate, and Non-Destructive Calibration for Real-World Fault-Tolerant Quantum ApplicationsJunpyo Kim, Jungmin Cho, Hyeonseong Jeong, Dongmoon Min, Junhyuk Choi, Juwon Hong, Jangwoo Kim. 547-563 [doi]
- Distributed-HISQ: A Distributed Quantum Control ArchitectureYilun Zhao 0002, Kangding Zhao, Peng Zhou, Dingdong Liu, Tingyu Luo, Yuzhen Zheng, Peng Luo, Shun Hu, Jin Lin, Cheng Guo, Yinhe Han 0001, Ying Wang 0001, Mingtang Deng, Junjie Wu 0003, Xiang Fu 0003. 564-578 [doi]
- Accurate Leakage Speculation for Quantum Error CorrectionChaithanya Naik Mude, Swamit Tannu. 579-594 [doi]
- YOUTIAO: Hybrid Multiplexing with Dynamic Qubit Grouping for Low-cost and Scalable Quantum WiringWuwei Tian, Liqiang Lu, Siwei Tan, Shiyu Li, Hengyi Li, Tianyao Chu, Xuhong Zhang 0002, Mingshuai Chen, Jianwei Yin. 595-608 [doi]
- NetZIP: Algorithm/Hardware Co-design of In-network Lossless Compression for Distributed Large Model TrainingJinghan Huang 0001, Hyungyo Kim, Nachuan Wang, Jaeyoung Kang 0004, Hrishi Shah, Eun-Kyung Lee, Minjia Zhang, Fan Lai, Nam Sung Kim. 609-625 [doi]
- Characterizing the Efficiency of Distributed Training: A Power, Performance, and Thermal PerspectiveSeokjin Go, Joongun Park, Spandan More, Hanjiang Wu, Irene Wang, Aaron Jezghani, Tushar Krishna, Divya Mahajan 0001. 626-642 [doi]
- SkipReduce: (Interconnection) Network Sparsity to Accelerate Distributed Machine LearningHans Kasan, Dennis Abts, Jungwook Choi, John Kim 0001. 643-658 [doi]
- Optimizing All-to-All Collective Communication with Fault Tolerance on Torus NetworksLe Qin, Junwei Cui, Weilin Cai, Meng Niu, Yan Yang, Jiayi Huang 0001. 659-674 [doi]
- Titan-I: An Open-Source, High Performance RISC-V Vector CoreJiuyang Liu, Qinjun Li, Yunqian Luo, Hongbin Zhang, Jiongjia Lu, Shupei Fan, Jianhao Ye, Yang Liu, Xiaoyi Liu, Yanqi Yang, Zewen Ye, Yuhang Zeng, Ao Shen, Rui Huang, Wei Cong, Xuecheng Zou, Mingyu Gao 0001. 675-690 [doi]
- SHADOW: Simultaneous Multi-Threading Architecture with Asymmetric ThreadsIshita Chaturvedi, Bhargav Reddy Godala, Abiram Gangavaram, Daniel Flyer, Tyler Sorensen 0001, Tor M. Aamodt, David I. August. 691-704 [doi]
- ATR: Out-of-Order Register Release Exploiting Atomic RegionsYinyuan Zhao, Surim Oh, Mingsheng Xu, Heiner Litz. 705-718 [doi]
- Vegapunk: Accurate and Fast Decoding for Quantum LDPC Codes with Online Hierarchical Algorithm and Sparse AcceleratorKaiwen Zhou 0003, Liqiang Lu, Debin Xiang, Chenning Tao, Anbang Wu, Jingwen Leng, Fangxin Liu, Mingshuai Chen, Jianwei Yin. 719-732 [doi]
- OneAdapt: Resource-Adaptive Compilation of Measurement-Based Quantum Computing for Photonic HardwareHezi Zhang, Jixuan Ruan, Dean Tullsen, Yufei Ding 0001, Ang Li 0006, Travis S. Humble. 733-748 [doi]
- MUSS-TI: Multi-level Shuttle Scheduling for Large-Scale Entanglement Module Linked Trapped-IonXian Wu, Chenghong Zhu, Jingbo Wang, Xin Wang 0022. 749-763 [doi]
- Rasengan: A Transition Hamiltonian-based Approximation Algorithm for Solving Constrained Binary Optimization ProblemsQifan Jiang 0001, Liqiang Lu, Debin Xiang, Tianyao Chu, Tianze Zhu, Jingwen Leng, Yun Liang 0001, Xiaoming Sun, Jianwei Yin. 764-777 [doi]
- Chasoň: Supporting Cross HBM Channel Data Migration to Enable Efficient Sparse Algebraic AccelerationUbaid Bakhtiar, Amirmahdi Namjoo, Bahar Asgari. 778-794 [doi]
- A Probabilistic Perspective on Tiling Sparse Tensor AlgebraRitvik Sharma, Zi Yu Xue, Nathan Zhang, Rubens Lacouture, Fredrik Kjolstad, Sara Achour, Mark Horowitz. 795-808 [doi]
- Bootes: Boosting the Efficiency of Sparse Accelerators Using Spectral ClusteringSanjali Yadav, Bahar Asgari. 809-823 [doi]
- Misam: Machine Learning Assisted Dataflow Selection in Accelerators for Sparse Matrix MultiplicationSanjali Yadav, Amirmahdi Namjoo, Bahar Asgari. 824-838 [doi]
- AxCore: A Quantization-Aware Approximate GEMM Unit for LLM InferenceJiaxiang Zou, Yonghao Chen, Xingyu Chen, Chenxi Xu, Xinyu Chen. 839-853 [doi]
- Amove: Accelerating LLMs through Mitigating Outliers and Salient Points via Fine-Grained Grouped Vectorized Data TypeXilong Xie, Liang Wang 0020, Limin Xiao, Meng Han, Lei Liu 0037, Xiangrong Xu, Jinquan Wang, Zhen Song, Xiaojian Liao. 854-868 [doi]
- MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model ServingJungi Lee, Junyong Park 0005, Soohyun Cha, JaeHoon Cho, Jaewoong Sim. 869-883 [doi]
- Micro-MAMA: Multi-Agent Reinforcement Learning for Multicore PrefetchingCharles Block, Gerasimos Gerogiannis, Josep Torrellas. 884-898 [doi]
- Ghost Threading: Helper-Thread Prefetching for Real SystemsYuxin Guo, Akshay Bhosale, Utpal Bora 0001, Alexandra W. Chadwick, Márton Erdos, Giacomo Gabrielli, Timothy M. Jones 0001. 899-914 [doi]
- Elevating Temporal Prefetching Through Instruction CorrelationShuiyi He, Zicong Wang, Xuan Tang, Hao Tang, Dezun Dong, Liquan Xiao. 915-928 [doi]
- Quartz: A Reconfigurable, Distributed-Memory Accelerator for Sparse ApplicationsCourtney Golden, Axel Feldmann, Joel S. Emer, Daniel Sánchez 0003. 929-943 [doi]
- SeaCache: Efficient and Adaptive Caching for Sparse AcceleratorsXintong Li, Jinchen Jiang, Mingyu Gao 0001. 944-957 [doi]
- NetSparse: In-Network Acceleration of Distributed Sparse KernelsGerasimos Gerogiannis, Dimitrios Merkouriadis, Charles Block, Annus Zulfiqar, Filippos Tofalos, Muhammad Shahbaz 0001, Josep Torrellas. 958-974 [doi]
- ColumnDisturb: Understanding Column-based Read Disturbance in Real DRAM Chips and Implications for Future SystemsIsmail Emir Yuksel, Ataberk Olgun, Nisa Bostanci, Haocong Luo, Abdullah Giray Yaglikçi, Onur Mutlu. 975-994 [doi]
- SuperSFQ: A Hardware Design to Realize High-Frequency Superconducting ProcessorsJunhyuk Choi, Juwon Hong, Junpyo Kim, Jungmin Cho, Hyeonseong Jeong, Dongmoon Min, Masamitsu Tanaka, Koji Inoue, Jangwoo Kim. 995-1010 [doi]
- Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM DeviceNiansong Zhang, Wenbo Zhu, Courtney Golden, Dan Ilan, Hongzheng Chen, Christopher Batten, Zhiru Zhang. 1011-1025 [doi]
- C3ache: Towards Hierarchical Cache-Centric Computing for Sparse Matrix Multiplication on GPGPUsXiaojie Li, Mingyu Wang, Baiqing Zhong, Haiqiu Huang, Guangjie Cao, Zhiyi Yu. 1026-1039 [doi]
- Leveraging Chiplet-Locality for Efficient Memory Mapping in Multi-Chip Module GPUsJunhyeok Park 0001, Sungbin Jang, Osang Kwon, Yongho Lee, Seokin Hong. 1040-1057 [doi]
- Security and Performance Implications of GPU Cache Eviction Priority HintsQizhong Wang, Xiangyue Huang, Yanan Guo, Yuanchao Xu 0001. 1058-1072 [doi]
- COSMOS: RL-Enhanced Locality-Aware Counter Cache Optimization for Secure MemoryHaoran Geng, Xiaoyang Lu, Yuezhi Che, Ziang Tian, Dazhao Cheng, Xian-He Sun, Michael T. Niemier, X. Sharon Hu. 1073-1086 [doi]
- CryptoBTB: A Secure Hierarchical BTB for Diverse Instruction Footprint WorkloadsDebpratim Adak, Eric Rotenberg, Amro Awad, Huiyang Zhou. 1087-1101 [doi]
- Efficient Security Support for CXL Memory through Adaptive Incremental Offloaded (Re-)EncryptionChuanhan Li, Jishen Zhao, Yuanchao Xu 0001. 1102-1116 [doi]
- Citadel: Rethinking Memory Allocation to Safeguard Against Inter-Domain Rowhammer ExploitsAnish Saxena, Walter Wang, Alexandros Daglis. 1117-1131 [doi]
- EcoCore: Dynamic Core Management for Improving Energy Efficiency in Latency-Critical ApplicationsGyeongseo Park, Minho Kim, Ki-Dong Kang, Yunhyeong Jeon, Seulki Kim, Daehoon Kim 0001. 1132-1146 [doi]
- Flexing RISC-V Instruction Subset Processors to Extreme EdgeAlireza Raisiardali, Konstantinos Iordanou, Jedrzej Kufel, Kowshik Gudimetla, Kris Myny, Emre Ozer 0001. 1147-1159 [doi]
- ReGate: Enabling Power Gating in Neural Processing UnitsYuqi Xue, Jian Huang 0006. 1160-1177 [doi]
- Multi-Dimensional ML-Pipeline Optimization in Cost-Effective Disaggregated DatacenterPingyi Huo, Anusha Devulapally, Hasan Al Maruf, Nandhini Chandramoorthy, Meena Arunachalam, Gulsum Gudukbay Akbulut, Mahmut T. Kandemir, Vijaykrishnan Narayanan. 1178-1192 [doi]
- CrossBit: Bitwise Computing in NAND Flash Memory with Inter-Bitline Data CommunicationHyunjin Kim, Seunghwan Song, Sukhyun Choi, Jeongin Choe, SangHyeok Han, Jisung Park 0001, Jinho Lee 0001, Jae-Joon Kim. 1193-1206 [doi]
- DEAR: Improving Performance and Lifetime of SSDs Using Dynamic Error-Aware RefreshJaeyong Lee, Beomjun Kim, Myoungjun Chun, Myungsuk Kim, Jihong Kim 0001. 1207-1220 [doi]
- Nexus Machine: An Energy-Efficient Active Message Inspired Reconfigurable ArchitectureRohan Juneja, Pranav Dangi, Thilini Kaushalya Bandara, Tulika Mitra, Li-Shiuan Peh. 1221-1235 [doi]
- FexMo: Enabling Fuse Execution Mode for Multi-task CGRAsYufei Yang, Chenhao Xie 0001, Chuliang Guo, Liansheng Liu, Xiyuan Peng, Datong Liu, Yu Peng 0002. 1236-1249 [doi]
- Crane: Inter-Layer Scheduling Framework for DNN Inference and Training Co-Support on Tiled ArchitectureYu Gong 0003, Lingyi Huang, Haodong Chang, Rongjian Liang, Cheng Yang 0013, Zhexiang Tang, Jiang Hu 0001, Bo Yuan 0001. 1250-1263 [doi]
- OASIS: A Commercial High Performance Terminal AI Processor Supporting RISC-V Tensor Extension InstructionsPeng Gao 0016, Yang Liu 0038, Haonan Sun 0006, Jiang Jiang, Jun Wang, Zonghui Hong, Jiali Qu. 1264-1283 [doi]
- Elk: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler TechniquesYiqi Liu, Yuqi Xue, Noelle Crawford, Jilong Xue, Jian Huang 0006. 1284-1299 [doi]
- Empowering Vector Architectures for ML: The CAMP Architecture for Matrix MultiplicationMohammadreza Esmali Nojehdeh, Hossein Mokhtarnia, Julian Pavon, Narcís Rodas, Roger Figueras Bagué, Enrico Reggiani, Miquel Moretó, Osman S. Unsal, Adrián Cristal, Eduard Ayguadé. 1300-1315 [doi]
- TAIDL: Tensor Accelerator ISA Definition Language with Auto-generation of Scalable Test OraclesDevansh Jain 0001, Marco Frigo, Jai Arora, Akash Pardeshi, ZhiHao Wang, Krut Patel, Charith Mendis. 1316-1333 [doi]
- OmniSim: Simulating Hardware with C Speed and RTL Accuracy for High-Level Synthesis DesignsRishov Sarkar, Cong Hao. 1334-1346 [doi]
- LEGOSim: A Unified Parallel Simulation Framework for Multi-chiplet Heterogeneous IntegrationTiantian Lin, Cheng Qiu, Xiaohang Wang 0001, Ling Wang 0005, Zhulin Zheng, Yingtao Jiang, Amit Kumar Singh 0002, Jieming Yin, Sihai Qiu, Xiaodong Li, Xin Tang, Jie Song, Mingzhe Zhang, Kui Ren 0001. 1347-1362 [doi]
- PyTorchSim: A Comprehensive, Fast, and Accurate NPU Simulation FrameworkWonhyuk Yang, Yunseon Shin, Okkyun Woo, Geonwoo Park, Hyungkyu Ham, Jeehoon Kang, Jongse Park, Gwangsun Kim. 1363-1380 [doi]
- LLMulator: Generalizable Cost Modeling for Dataflow Accelerators with Input-Adaptive Control FlowKaiyan Chang, Wenlong Zhu, Shengwen Liang, Huawei Li 0001, Ying Wang 0001. 1381-1396 [doi]
- Swift and Trustworthy Large-Scale GPU Simulation with Fine-Grained Error Modeling and Hierarchical ClusteringEuijun Chung, Seonjin Na, Sung Ha Kang, Hyesoon Kim. 1397-1411 [doi]
- Understanding and Mitigating Covert Channel and Side Channel Vulnerabilities Introduced by RowHammer DefensesF. Nisa Bostanci, Oguzhan Canpolat, Ataberk Olgun, Ismail Emir Yüksel, Konstantinos Kanellopoulos, Mohammad Sadrosadati, Abdullah Giray Yaglikçi, Onur Mutlu. 1412-1432 [doi]
- ρHammer: Reviving RowHammer Attacks on New Architectures via PrefetchingWeijie Chen, Shan Tang, Yulin Tang, Xiapu Luo, Yinqian Zhang, Weizhong Qiang. 1433-1447 [doi]
- DRAM Fault Classification through Large-Scale Field Monitoring for Robust Memory RAS ManagementHoiju Chung, Euisang Oh, Seungmin Baek, Hyeongshin Yoon, Jaesung Yoo, Sanghwan Lee, Yongjun Lee, Arhatha Bramhanand, Brett Dodds, Yang Zhou, Nam Sung Kim. 1448-1461 [doi]
- DiffTest-H: Toward Semantic-Aware Communication in Hardware-Accelerated Processor VerificationKunlin You, Yinan Xu 0001, Kehan Feng, Luoshan Cai, Yaoyang Zhou, Yungang Bao. 1462-1476 [doi]
- SymbFuzz: Symbolic Execution Guided Hardware FuzzingSamit Shahnawaz Miftah, Amisha Srivastava, Hyunmin Kim, Shiyi Wei, Kanad Basu. 1477-1490 [doi]
- TransFusion: End-to-End Transformer Acceleration via Graph Fusion and PipeliningLinxuan Zhang, José Nelson Amaral, Di Niu. 1491-1504 [doi]
- X-SET: An Efficient Graph Pattern Matching Accelerator With Order-Aware Parallel Intersection UnitsChenxi Xu, Tianhui Shi, Shixuan Sun, Jidong Zhai, Xinyu Chen. 1505-1519 [doi]
- FALA: Locality-Aware PIM-Host Cooperation for Graph Processing with Fine-Grained Column AccessChangmin Shin, Jaeyong Song, Seongmin Na, Jun Sung, Hongsun Jang, Jinho Lee 0001. 1520-1534 [doi]
- Rethinking Tiling and Dataflow for SpMM Acceleration: A Graph Transformation FrameworkAmir Ghazizadeh Ahsaei, Lingxiang Yin, Shilin Tian, Fangzhou Ye, Fan Yao 0001, Hao Zheng 0005. 1535-1548 [doi]
- Boosting Task Scheduling Data Locality with Low-latency, HW-accelerated Label PropagationLucas Morais, Juan Miguel De Haro Ruiz, Alfredo Goldman, Guido Araujo, Giacomo Pedretti, Jim Ignowski, Michael Frank 0008, Xavier Martorell, Daniel Jiménez-González, Carlos Álvarez 0001. 1549-1564 [doi]
- BitL: A Hybrid Bit-Serial and Parallel Deep Learning Accelerator for Critical Path ReductionSeunghyun Lee, Dongho Ha, Sungbin Kim, Sungwoo Kim 0003, Hyunwuk Lee, Won Woo Ro. 1565-1578 [doi]
- HiPACK: Efficient Sub-8-Bit Direct Convolution with SIMD and Bitwise ManagementYao Chen 0008, Cheng Gong, Bingsheng He. 1579-1591 [doi]
- MCBP: A Memory-Compute Efficient LLM Inference Accelerator Leveraging Bit-Slice-enabled Sparsity and RepetitivenessHuizheng Wang, Zichuan Wang, Zhiheng Yue, Yousheng Long, Taiquan Wei, Jianxun Yang, Yang Wang 0089, Chao Li 0009, Shaojun Wei, Yang Hu 0001, Shouyi Yin. 1592-1608 [doi]
- PolymorPIC: Embedding Polymorphic Processing-in-Cache in RISC-V based Processor for Full-stack Efficient AI InferenceCheng Zou, Ziling Wei, Jun Yan Lee, Chen Nie, Kang You, Zhezhi He. 1609-1624 [doi]
- MHE-TPE: Multi-Operand High-Radix Encoder for Mixed-Precision Fixed-Point Tensor Processing EnginesQizhe Wu, Jinyi Zhou, Zhanhe Hu, Zhichen Zeng 0002, Huawen Liang, Jiuru Zhu, Linfeng Tao, Xin Zhang, Zekang Cheng, Letian Zhao, Wei Yuan 0006, Xiaotian Wang, Xi Jin 0002. 1625-1639 [doi]
- SuperMesh: Energy-Efficient Collective Communications for AcceleratorsSabuj Laskar, Pranati Majhi, Abdullah Muzahid, Eun Jung Kim 0001. 1640-1655 [doi]
- SMX: Heterogeneous Architecture for Universal Sequence Alignment AccelerationMax Doblas, Po Jui Shih, Oscar Lostes-Cazorla, Miquel Moreto, Christopher Batten, Santiago Marco-Sola. 1656-1671 [doi]
- MINDFUL: Safe, Implantable, Large-Scale Brain-Computer Interfaces from a System-Level Design PerspectiveGuy Eichler, Yatin Gilhotra, Nanyu Zeng, Martha A. Kim, Kenneth L. Shepard, Luca P. Carloni. 1672-1689 [doi]
- DS-TIDE: Harnessing Dynamical Systems for Efficient Time-Independent Differential Equation SolvingChuan Liu 0001, Chunshu Wu, Ruibing Song, Guangyan Sun, Ying Nian Wu, Yousu Chen, Ang Li 0006, Tong Geng. 1690-1703 [doi]
- Towards Closing the Performance Gap for Cryptographic Kernels Between CPUs and Specialized HardwareNaifeng Zhang, Sophia Fu, Franz Franchetti. 1704-1718 [doi]
- HAWK: Fully Homomorphic Encryption Accelerator with Fixed-Word Key Decomposition SwitchingLiang Kong, Shengyu Fan, Xianglong Deng, Lei Chen, Guang Fan, Guiming Shi, Yilan Zhu, Geng Yang, Shoumeng Yan, Mingzhe Zhang. 1719-1734 [doi]
- ShadowBinding: Realizing Effective Microarchitectures for In-Core Secure Speculation SchemesAmund Bergland Kvalsvik, Magnus Själander. 1735-1748 [doi]
- SmartPIR: A Private Information Retrieval System using Computational Storage DevicesZehao Chen, Honghui You, Qian Wei, Hang Lu, Lei Ju 0001, Zhaoyan Shen. 1749-1762 [doi]
- Beyond Page Migration: Enhancing Tiered Memory Performance via Integrated Last-Level Cache Management and Page MigrationHwanjun Lee, Minho Kim, Yeji Jung, Seonmu Oh, Ki-Dong Kang, Seunghak Lee, Daehoon Kim 0001. 1763-1776 [doi]
- Learning to Walk: Architecting Learned Virtual Memory TranslationKaiyang Zhao 0002, Yuang Chen, Xenia Xu, Dan Schatzberg, Nastaran Hajinaza, Rupin Vakharwala, Andy Anderson, Dimitrios Skarlatos 0002. 1777-1792 [doi]
- A. Delegato: Locality-Aware Atomic Memory Operations on ChipletsVíctor Soria Pardos, Adrià Armejach, Tiago Mück, Darío Suárez Gracia, José A. Joao, Miquel Moretó. 1793-1808 [doi]
- Re-architecting End-host Networking with CXL: Coherence, Memory, and OffloadingHouxiang Ji, Yifan Yuan, Yang Zhou, Ipoom Jeong, Ren Wang 0001, Saksham Agarwal, Nam Sung Kim. 1809-1823 [doi]
- GCC: A 3DGS Inference Architecture with Gaussian-Wise and Cross-Stage Conditional ProcessingMinnan Pei, Gang Li 0015, Junwen Si, Zeyu Zhu, Zitao Mo, Peisong Wang, Zhuoran Song, Xiaoyao Liang, Jian Cheng 0001. 1824-1837 [doi]
- RTGS: Real-Time 3D Gaussian Splatting SLAM via Multi-Level Redundancy ReductionLeshu Li, Jiayin Qin, Jie Peng 0002, Zishen Wan, Huaizhi Qu, Ye Han, Pingqing Zheng, Hongsen Zhang, Yu Cao, Tianlong Chen 0001, Yang Katie Zhao. 1838-1851 [doi]
- REACT3D: Real-time Edge Accelerator for Incremental Training in 3D Gaussian Splatting based SLAM SystemsHongyi Wang, Zhenhua Zhu, Tianchen Zhao, Yunfei Xiang, Zehao Wang, Jincheng Yu, Huazhong Yang, Yuan Xie 0001, Yu Wang 0002. 1852-1866 [doi]
- PointISA: ISA-Extensions for Efficient Point Cloud Analytics via Architecture and Algorithm Co-DesignMeng Han, Liang Wang 0020, Limin Xiao, Hao Zhang, Bowen Jiang, Xilong Xie, Jianfeng Zhu 0001, Shaojun Wei, Leibo Liu. 1867-1881 [doi]