Abstract is missing.
- Message from the MICRO 2024 General Chairs: "Hi, How Are you?" - "Jeremiah The Innocent" MuralDaniel Johnson. [doi]
- Message from the MICRO 2024 Program ChairsDaniel A. Jiménez, Alaa R. Alameldeen. [doi]
- Hardware-Assisted Virtualization of Neural Processing Units for Cloud PlatformsYuqi Xue, Yiqi Liu, Lifeng Nai, Jian Huang 0006. 1-16 [doi]
- Elastic Translations: Fast Virtual Memory with Multiple Translation SizesStratos Psomadakis, Chloe Alverti, Vasileios Karakostas, Christos Katsakioris, Dimitrios Siakavaras, Konstantinos Nikas, Georgios I. Goumas, Nectarios Koziris. 17-35 [doi]
- Distributed Page Table: Harnessing Physical Memory as an Unbounded Hashed Page TableOsang Kwon, Yongho Lee, Junhyeok Park, Sungbin Jang, Byungchul Tak, Seokin Hong. 36-49 [doi]
- CamPU: A Multi-Camera Processing Unit for Deep Learning-based 3D Spatial Computing SystemsDongseok Im, Hoi-Jun Yoo. 50-63 [doi]
- AdapTiV: Sign-Similarity Based Image-Adaptive Token Merging for Vision Transformer AccelerationSeungjae Yoo, Hangyeol Kim, Joo-Young Kim. 64-77 [doi]
- Fusion-3D: Integrated Acceleration for Instant 3D Reconstruction and Real-Time RenderingSixu Li, Yang Zhao 0013, Chaojian Li, Bowei Guo, Jingqun Zhang, Wenbo Zhu, Zhifan Ye, Cheng Wan 0005, Yingyan Celine Lin. 78-91 [doi]
- Secure Prefetching for Secure Cache SystemsSumon Nath, Agustín Navarro-Torres, Alberto Ros 0001, Biswabandan Panda. 92-104 [doi]
- HyperTEE: A Decoupled TEE Architecture with Secure Enclave ManagementYunkai Bai, Peinan Li, Yubiao Huang, Michael C. Huang 0001, Shijun Zhao, Lutan Zhao, Fengwei Zhang, Dan Meng, Rui Hou 0001. 105-120 [doi]
- Defending Against EMI Attacks on Just-In-Time Checkpoint for Resilient Intermittent SystemsJaeseok Choi, Hyunwoo Joe, Changhee Jung, Jongouk Choi. 121-135 [doi]
- A Mess of Memory System Benchmarking, Simulation and Application ProfilingPouya Esmaili-Dokht, Francesco Sgherzi, Valéria Soldera Girelli, Isaac Boixaderas, Mariana Carmin, Alireza Monemi, Adrià Armejach, Estanislao Mercadal, Germán Llort, Petar Radojkovic, Miquel Moretó, Judit Giménez, Xavier Martorell, Eduard Ayguadé, Jesús Labarta, Emanuele Confalonieri, Rishabh Dubey, Jason Adlard. 136-152 [doi]
- vTrain: A Simulation Framework for Evaluating Cost-Effective and Compute-Optimal Large Language Model TrainingJehyeon Bang, Yujeong Choi, Myeongwoo Kim, Yongdeok Kim, Minsoo Rhu. 153-167 [doi]
- HyFiSS: A Hybrid Fidelity Stall-Aware Simulator for GPGPUsJianchao Yang, Mei Wen, Dong Chen, Zhaoyun Chen, Zeyu Xue, Yuhang Li, Junzhong Shen, Yang Shi. 168-185 [doi]
- Unleashing CPU Potential for Executing GPU Programs Through Compiler/Runtime OptimizationsRuobing Han, Jisheng Zhao, Hyesoon Kim. 186-200 [doi]
- A Framework for Fine-Grained Program VersioningYishen Chen, Saman P. Amarasinghe. 201-214 [doi]
- LightWSP: Whole-System Persistence on the CheapYuchen Zhou, Jianping Zeng 0001, Changhee Jung. 215-230 [doi]
- DelayAVF: Calculating Architectural Vulnerability Factors for Delay FaultsPeter W. Deutsch, Vincent Quentin Ulitzsch, Sudhanva Gurumurthi, Vilas Sridharan, Joel S. Emer, Mengjia Yan 0001. 231-245 [doi]
- Polymorphic Error CorrectionEvgeny Manzhosov, Simha Sethumadhavan. 246-262 [doi]
- DRCTL: A Disorder-Resistant Computation Translation Layer Enhancing the Lifetime and Performance of Memristive CIM ArchitectureHeng Zhou, Bing Wu 0001, Huan Cheng, Jinpeng Liu, Taoming Lei, Dan Feng 0001, Wei Tong 0001. 263-277 [doi]
- A Case for Speculative Address Translation with Rapid Validation for GPUsJunhyeok Park, Osang Kwon, Yongho Lee, Seongwook Kim, Gwangeun Byeon, Jihun Yoon, Prashant J. Nair, Seokin Hong. 278-292 [doi]
- SUV: Static Analysis Guided Unified Virtual MemoryPratheek B, Guilherme Cox, Ján Veselý, Arkaprava Basu. 293-308 [doi]
- STAR: Sub-Entry Sharing-Aware TLB for Multi-Instance GPUBingyao Li, Yueqi Wang, Tianyu Wang, Lieven Eeckhout, Jun Yang 0002, Aamer Jaleel, Xulong Tang. 309-323 [doi]
- CacheCraft: Enhancing GPU Performance under Memory Protection through Reconstructed CachingSoyoung Park, Hojung Namkoong, Boyeol Choi, Michael B. Sullivan 0001, Jungrae Kim. 324-337 [doi]
- Trinity: A General Purpose FHE AcceleratorXianglong Deng, Shengyu Fan, Zhicheng Hu, ZhuoYu Tian, Zihao Yang, Jiangrui Yu, Dingyuan Cao 0002, Dan Meng, Rui Hou 0001, Meng Li, Qian Lou, Mingzhe Zhang. 338-351 [doi]
- UFC: A Unified Accelerator for Fully Homomorphic EncryptionMinxuan Zhou, Yujin Nam, Xuan Wang, Youhak Lee, Chris Wilkerson, Raghavan Kumar, Sachin Taneja, Sanu Mathew, Rosario Cammarota, Tajana Rosing. 352-365 [doi]
- Accelerating Zero-Knowledge Proofs Through Hardware-Algorithm Co-DesignNikola Samardzic, Simon Langowski, Srinivas Devadas, Daniel Sánchez 0003. 366-379 [doi]
- A Compiler-Like Framework for Optimizing Cryptographic Big Integer Multiplication on GPUsZhuoran Ji, Jianyu Zhao, Zhaorui Zhang, Jiming Xu, Shoumeng Yan, Lei Ju 0001. 380-392 [doi]
- Beehive: A Flexible Network Stack for Direct-Attached AcceleratorsKatie Lim, Matthew Giordano, Theano Stavrinos, Irene Zhang, Jacob Nelson 0001, Baris Kasikci, Thomas E. Anderson. 393-408 [doi]
- Stellar: An Automated Design Framework for Dense and Sparse Spatial AcceleratorsHasan Nazim Genc, Hansung Kim, Prashanth Ganesh, Yakun Sophia Shao. 409-422 [doi]
- LUCIE: A Universal Chiplet-Interposer Design Framework for Plug-and-Play IntegrationZixi Li, David Wentzlaff. 423-436 [doi]
- A Scalable, Efficient, and Robust Dynamic Memory Management Library for HLS-based FPGAsQinggang Wang, Long Zheng 0003, Zhaozeng An, Shuyi Xiong, Runze Wang, Yu Huang 0013, Pengcheng Yao, Xiaofei Liao, Hai Jin 0001, Jingling Xue. 437-450 [doi]
- Customizing Cache Indexing Through Entropy EstimationKevin Weston, Avery Johnson, Vahid Janfaza, Farabi Mahmud, Abdullah Muzahid. 451-463 [doi]
- The Last-Level Branch PredictorDavid Schall, Andreas Sandberg, Boris Grot. 464-479 [doi]
- Timely, Efficient, and Accurate Branch PrecomputationAniket Anand Deshmukh, Lingzhe Chester Cai, Yale N. Patt. 480-492 [doi]
- Localizing the Tag Comparisons in the Wakeup Logic to Reduce Energy Consumption of the Issue QueueKenichiro Mori, Sota Kosugi, Hiroto Yoshida, Hajime Shimada, Hideki Ando. 493-506 [doi]
- RTL2MμPATH: Multi-μPATH Synthesis with Applications to Hardware Security VerificationYao Hsiao, Nikos Nikoleris, Artem Khyzha, Dominic P. Mulligan, Gustavo Petri, Christopher W. Fletcher, Caroline Trippel. 507-524 [doi]
- SRender: Boosting Neural Radiance Field Efficiency via Sensitivity-Aware Dynamic Precision RenderingZhuoran Song, Houshu He, Fangxin Liu, Yifan Hao, Xinkai Song, Li Jiang 0002, Xiaoyao Liang. 525-537 [doi]
- Cambricon-C: Efficient 4-Bit Matrix Unit via PrimitivizationYi Chen, Yongwei Zhao, Yifan Hao, Yuanbo Wen, Yuntao Dai, Xiaqing Li, Yang Liu, Rui Zhang 0040, Mo Zou, Xinkai Song, Xing Hu 0001, Zidong Du, Huaping Chen 0001, Qi Guo 0001, Tianshi Chen 0002. 538-550 [doi]
- BBS: Bi-Directional Bit-Level Sparsity for Deep Learning AccelerationYuzong Chen 0001, Jian Meng, Jae-sun Seo, Mohamed S. Abdelfattah. 551-564 [doi]
- SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module AcceleratorsMohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque. 565-579 [doi]
- SCALE: A Structure-Centric Accelerator for Message Passing Graph Neural NetworksLingxiang Yin, Sanjay Gandham, Mingjie Lin, Hao Zheng 0005. 580-593 [doi]
- Low-Overhead General-Purpose Near-Data Processing in CXL Memory ExpandersHyungkyu Ham, Jeongmin Hong, Geonwoo Park, Yunseon Shin, Okkyun Woo, Wonhyuk Yang, Jinhoon Bae, Eunhyeok Park, Hyojin Sung, Euicheol Lim, Gwangsun Kim. 594-611 [doi]
- PIFS-Rec: Process-In-Fabric-Switch for Large-Scale Recommendation System InferencesPingyi Huo, Anusha Devulapally, Hasan Al Maruf, Minseo Park, Krishnakumar Nair, Meena Arunachalam, Gulsum Gudukbay Akbulut, Mahmut Taylan Kandemir, Vijaykrishnan Narayanan. 612-626 [doi]
- PIM-MMU: A Memory Management Unit for Accelerating Data Transfers in Commercial PIM SystemsDongjae Lee, Bongjoon Hyun, Taehun Kim, Minsoo Rhu. 627-642 [doi]
- Azul: An Accelerator for Sparse Iterative Solvers Leveraging Distributed On-Chip MemoryAxel Feldmann, Courtney Golden, Yifan Yang, Joel S. Emer, Daniel Sánchez 0003. 643-656 [doi]
- FloatAP: Supporting High-Performance Floating-Point Arithmetic in Associative ProcessorsKailin Yang, José F. Martínez. 657-670 [doi]
- Atomic Cache: Enabling Efficient Fine-Grained Synchronization with Relaxed Memory Consistency on GPGPUs Through In-Cache Atomic OperationsYicong Zhang, Mingyu Wang, Wangguang Wang, Yangzhan Mai, Haiqiu Huang, Zhiyi Yu. 671-685 [doi]
- Concurrency-Aware Register Stacks for Efficient GPU Function CallsNi Kang, Ahmad Alawneh, Mengchi Zhang, Timothy G. Rogers. 686-699 [doi]
- CPElide: Efficient Multi-Chiplet GPU Implicit SynchronizationPreyesh Dalmia, Rajesh Shashi Kumar, Matthew D. Sinclair. 700-717 [doi]
- Flag-Proxy Networks: Overcoming the Architectural, Scheduling and Decoding Obstacles of Quantum LDPC CodesSuhas Vittal, Ali Javadi-Abhari, Andrew W. Cross, Lev S. Bishop, Moinuddin Qureshi. 718-734 [doi]
- Qoncord: A Multi-Device Job Scheduling Framework for Variational Quantum AlgorithmsMeng Wang 0033, Poulami Das 0005, Prashant J. Nair. 735-749 [doi]
- Surf-Deformer: Mitigating Dynamic Defects on Surface Code via Adaptive DeformationKeyi Yin, Xiang Fang, Travis S. Humble, Ang Li 0006, Yunong Shi, Yufei Ding. 750-764 [doi]
- Hestia: An Efficient Cross-Level Debugger for High-Level SynthesisRuifan Xu, Jin Luo, Yawen Zhang, Yibo Lin, Runsheng Wang, Ru Huang 0001, Yun Liang 0001. 765-779 [doi]
- Looking into the Black Box: Monitoring Computer Architecture Simulations in Real-Time with AkitaRTMAli Mosallaei, Katherine E. Isaacs, Yifan Sun 0002. 780-794 [doi]
- Over-Synchronization in GPU ProgramsAjay Nayak, Arkaprava Basu. 795-809 [doi]
- Temporarily Unauthorized Stores: Write First, Ask for Permission LaterJuan M. Cebrian, Magnus Jahre, Alberto Ros 0001. 810-822 [doi]
- Leveraging Cache Coherence to Detect and Repair False Sharing On-the-flyVipin Patel, Swarnendu Biswas, Mainak Chaudhuri. 823-839 [doi]
- Chaining Transactions for Effective Concurrency Management in Hardware Transactional MemoryVíctor Nicolás-Conesa, J. Rubén Titos Gil, Ricardo Fernández Pascual, Manuel E. Acacio, Alberto Ros 0001. 840-855 [doi]
- TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Machine LearningWilliam Won, Midhilesh Elavazhagan, Sudarshan Srinivasan, Swati Gupta, Tushar Krishna. 856-870 [doi]
- Ring Road: A Scalable Polar-Coordinate-based 2D Network-on-Chip ArchitectureYinxiao Feng, Wei Li, Kaisheng Ma. 871-884 [doi]
- Uncovering Real GPU NoC Characteristics: Implications on Interconnect ArchitectureZhixian Jin, Christopher Rocca, Jiho Kim, Hans Kasan, Minsoo Rhu, Ali Bakhoda, Tor M. Aamodt, John Kim 0001. 885-898 [doi]
- MINT: Securely Mitigating Rowhammer with a Minimalist in-DRAM TrackerMoinuddin Qureshi, Salman Qazi, Aamer Jaleel. 899-914 [doi]
- BreakHammer: Enhancing RowHammer Mitigations by Carefully Throttling Suspect ThreadsOguzhan Canpolat, A. Giray Yaglikçi, Ataberk Olgun, Ismail Emir Yuksel, Yahya Can Tugrul, Konstantinos Kanellopoulos, Oguz Ergin, Onur Mutlu. 915-934 [doi]
- ImPress: Securing DRAM Against Data-Disturbance Errors via Implicit Row-Press MitigationAnish Saxena, Aamer Jaleel, Moinuddin Qureshi. 935-948 [doi]
- Self-Managing DRAM: A Low-Cost Framework for Enabling Autonomous and Efficient DRAM Maintenance OperationsHasan Hassan, Ataberk Olgun, A. Giray Yaglikçi, Haocong Luo, Onur Mutlu. 949-965 [doi]
- Memory Allocation Under Hardware CompressionMuhammad Laghari, Yuqing Liu, Gagandeep Panwar, David Bears, Chandler Jearls, Raghavendra Srinivas, Esha Choukse, Kirk W. Cameron, Ali Raza Butt, Xun Jian 0002. 966-982 [doi]
- Genie Cache: Non-Blocking Miss Handling and Replacement in Page-Table-Based DRAM CacheYoungin Kim 0003, William J. Song. 983-996 [doi]
- StarNUMA: Mitigating NUMA Challenges with Memory PoolingAlbert Cho, Alexandros Daglis. 997-1012 [doi]
- ThreadFuser: A SIMT Analysis Framework for MIMD ProgramsAhmad Alawneh, Ni Kang, Mahmoud Khairy, Timothy G. Rogers. 1013-1026 [doi]
- Extending GPU Ray-Tracing Units for Hierarchical Search AccelerationAaron Barnes, Fangjia Shen, Timothy G. Rogers. 1027-1040 [doi]
- Generalizing Ray Tracing Accelerators for Tree Traversals on GPUsDongho Ha, Lufei Liu, Yuan-Hsi Chou, Seokjin Go, Won Woo Ro, Hung-Wei Tseng 0001, Tor M. Aamodt. 1041-1057 [doi]
- LIBRA: Memory Bandwidth- and Locality-Aware Parallel Tile RenderingAurora Tomás, Juan L. Aragón, Joan-Manuel Parcerisa, Antonio González 0001. 1058-1072 [doi]
- Rearchitecting a Neuromorphic Processor for Spike-Driven Brain-Computer InterfacingHunjun Lee, Yeongwoo Jang, Daye Jung, Seunghyun Song, Jangwoo Kim. 1073-1089 [doi]
- COMPASS: SRAM-Based Computing-in-Memory SNN Accelerator with Adaptive Spike SpeculationZongwu Wang, Fangxin Liu, Ning Yang, Shiyuan Huang, Haomin Li, Li Jiang 0002. 1090-1106 [doi]
- LoAS: Fully Temporal-Parallel Dataflow for Dual-Sparse Spiking Neural NetworksRuokai Yin, Youngeun Kim, Di Wu 0016, Priyadarshini Panda. 1107-1121 [doi]
- ActiveN: A Scalable and Flexibly-Programmable Event-Driven Neuromorphic ProcessorXiaoyi Liu, Zhongzhu Pu, Peng Qu, Weimin Zheng, Youhui Zhang. 1122-1137 [doi]
- Ghost Arbitration: Mitigating Interconnect Side-Channel Timing Attacks in GPUZhixian Jin, Jaeguk Ahn, Jiho Kim, Hans Kasan, Jina Song, WonJun Song, John Kim 0001. 1138-1152 [doi]
- IvLeague: Side Channel-Resistant Secure Architectures Using Isolated Domains of Dynamic Integrity TreesMd Hafizul Islam Chowdhuryy, Fan Yao. 1153-1168 [doi]
- Veiled Pathways: Investigating Covert and Side Channels Within GPU UncoreYuanqing Miao, Yingtian Zhang, Dinghao Wu, Danfeng Zhang, Gang Tan, Rui Zhang 0037, Mahmut Taylan Kandemir. 1169-1183 [doi]
- The TYR Dataflow Architecture: Improving Locality by Taming ParallelismNikhil Agarwal, Mitchell Fream, Souradip Ghosh, Brian C. Schwedock, Nathan Beckmann. 1184-1200 [doi]
- Sparsepipe: Sparse Inter-operator Dataflow Architecture with Cross-Iteration ReuseYunan Zhang, Po-An Tsai, Hung-Wei Tseng 0001. 1201-1216 [doi]
- Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUsRishabh Jain, Vivek M. Bhasi, Adwait Jog, Anand Sivasubramaniam, Mahmut T. Kandemir, Chita R. Das. 1217-1232 [doi]
- Terminus: A Programmable Accelerator for Read and Update Operations on Sparse Data StructuresHyun Ryong Lee, Daniel Sánchez 0003. 1233-1246 [doi]
- SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated TilingHuizheng Wang, Jiahao Fang, Xinru Tang, Zhiheng Yue, Jinxi Li, Yubin Qin, Sihan Guan, Qinze Yang, Yang Wang 0089, Chao Li 0009, Yang Hu 0001, Shouyi Yin. 1247-1263 [doi]
- RAHP: A Redundancy-aware Accelerator for High-performance Hypergraph Neural NetworkHui Yu, Yu Zhang 0027, Ligang He, Yingqi Zhao, Xintao Li, Ruida Xin, Jin Zhao 0003, Xiaofei Liao, Haikun Liu, Bingsheng He, Hai Jin 0001. 1264-1277 [doi]
- Leviathan: A Unified System for General-Purpose Near-Data ComputingBrian C. Schwedock, Nathan Beckmann. 1278-1294 [doi]
- TMiner: A Vertex-Based Task Scheduling Architecture for Graph Pattern MiningZerun Li, Xiaoming Chen 0003, Yinhe Han 0001. 1295-1308 [doi]
- PointCIM: A Computing-in-Memory Architecture for Accelerating Deep Point Cloud AnalyticsXuan-Jun Chen, Han-Ping Chen, Chia-Lin Yang. 1309-1322 [doi]
- Blenda: Dynamically-Reconfigurable Stacked DRAMMohammad Bakhshalipour, HamidReza Zare, Farid Samandi, Fatemeh Golshan, Pejman Lotfi-Kamran, Hamid Sarbazi-Azad. 1323-1337 [doi]
- ICED: An Integrated CGRA Framework Enabling DVFS-Aware AccelerationCheng Tan 0002, Miaomiao Jiang, Deepak Patil, Yanghui Ou, Zhaoying Li, Lei Ju 0001, Tulika Mitra, Hyunchul Park, Antonino Tumeo, Jeff Zhang 0001. 1338-1352 [doi]
- SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of ExpertsRaghu Prabhakar, Ram Sivaramakrishnan, Darshan Gandhi, Yun Du, Mingran Wang, Xiangyu Song, Kejie Zhang, Tianren Gao, Angela Wang, Xiaoyan Li, Yongning Sheng, Joshua Brot, Denis Sokolov, Apurv Vivek, Calvin Leung, Arjun Sabnis, Jiayu Bai, Tuowen Zhao, Mark Gottscho, David Jackson, Mark Luttrell, Manish K. Shah, Zhengyu Chen, Kaizhao Liang, Swayambhoo Jain, Urmish Thakker, Dawei Huang, Sumti Jairath, Kevin J. Brown, Kunle Olukotun. 1353-1366 [doi]
- Scalar Vector RunaheadJaime Roelandts, Ajeya Naithani, Sam Ainsworth, Timothy M. Jones 0001, Lieven Eeckhout. 1367-1381 [doi]
- Weeding out Front-End Stalls with Uneven Block Size Instruction CacheRoman Brunner, Rakesh Kumar. 1382-1396 [doi]
- Mosaic: Harnessing the Micro-Architectural Resources of Servers in Serverless EnvironmentsJovan Stojkovic, Esha Choukse, Enrique Saurez, Íñigo Goiri, Josep Torrellas. 1397-1412 [doi]
- SOPHGO BM1684X: A Commercial High Performance Terminal AI Processor with Large Model SupportPeng Gao, Yang Liu, Jun Wang, Wanlin Cai, Guangchong Shen, Zonghui Hong, Jiali Qu, Ning Wang. 1413-1428 [doi]
- Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous BatchingSungmin Yun 0001, Kwanhee Kyung, Juhwan Cho, Jaewan Choi, Jongmin Kim 0007, Byeongho Kim, Sukhan Lee 0002, Kyomin Sohn, Jung Ho Ahn. 1429-1443 [doi]
- VGA: Hardware Accelerator for Scalable Long Sequence Model InferenceSeung Yul Lee, Hyunseung Lee 0001, Jihoon Hong, SangLyul Cho, Jae W. Lee. 1444-1457 [doi]
- FuseMax: Leveraging Extended Einsums to Optimize Attention Accelerator DesignNandeeka Nayak, Xinrui Wu, Toluwanimi O. Odemuyiwa, Michael Pellauer, Joel S. Emer, Christopher W. Fletcher. 1458-1473 [doi]
- Cambricon-LLM: A Chiplet-Based Hybrid Architecture for On-Device Inference of 70B LLMZhongkai Yu, Shengwen Liang, Tianyun Ma, Yunke Cai, Ziyuan Nan, Di Huang, Xinkai Song, Yifan Hao, Jie Zhang, Tian Zhi, Yongwei Zhao, Zidong Du, Xing Hu 0001, Qi Guo 0001, Tianshi Chen 0002. 1474-1488 [doi]
- Ares-Flash: Efficient Parallel Integer Arithmetic Operations Using NAND Flash MemoryJian Chen, Congming Gao, Youyou Lu, Yuhao Zhang, Jiwu Shu. 1489-1503 [doi]
- Demystifying a CXL Type-2 Device: A Heterogeneous Cooperative Computing PerspectiveHouxiang Ji, Srikar Vanavasam, Yang Zhou, Qirong Xia, Jinghan Huang 0001, Yifan Yuan, Ren Wang 0001, Pekon Gupta, Bhushan Chitlur, Ipoom Jeong, Nam Sung Kim. 1504-1517 [doi]
- NeoMem: Hardware/Software Co-Design for CXL-Native Memory TieringZhe Zhou 0002, Yiqi Chen, Tao Zhang 0032, Yang Wang 0053, Ran Shu 0001, Shuotao Xu, Peng Cheng 0005, Lei Qu, Yongqiang Xiong, Jie Zhang 0048, Guangyu Sun 0003. 1518-1531 [doi]
- SuperCore: An Ultra-Fast Superconducting Processor for Cryogenic ApplicationsJunhyuk Choi, Ilkwon Byun, Juwon Hong, Dongmoon Min, Junpyo Kim, Jungmin Cho, Hyeonseong Jeong, Masamitsu Tanaka, Koji Inoue, Jangwoo Kim. 1532-1547 [doi]
- SOPHIE: A Scalable Recurrent Ising Machine Using Optically Addressed Phase Change MemoryGuowei Yang, Sina Karimi, Carlos A. Ríos Ocampo, Ayse K. Coskun, Ajay Joshi. 1548-1561 [doi]
- GauSPU: 3D Gaussian Splatting Processor for Real-Time SLAM SystemsLizhou Wu, Haozhe Zhu, Siqi He, Jiapei Zheng, Chixiao Chen, Xiaoyang Zeng. 1562-1573 [doi]
- Multi-Issue Butterfly Architecture for Sparse Convex Quadratic ProgrammingMaolin Wang 0002, Ian McInerney, Bartolomeo Stellato, Fengbin Tu, Stephen P. Boyd, Hayden Kwok-Hay So, Kwang-Ting Cheng. 1574-1587 [doi]
- HgPCN: A Heterogeneous Architecture for E2E Embedded Point Cloud InferenceYiming Gao, Chao Jiang, Wesley Piard, Xiangru Chen, Bhavesh Patel, Herman Lam. 1588-1600 [doi]
- Acamar: A Dynamically Reconfigurable Scientific Computing Accelerator for Robust Convergence and Minimal Resource UnderutilizationUbaid Bakhtiar, Helya Hosseini, Bahar Asgari. 1601-1616 [doi]
- Bridging the Gap Between LLMs and LNS with Dynamic Data Format and Architecture CodesignPouya Haghi, Chunshu Wu, Zahra Azad, Yanfei Li, Andrew Gui, Yuchen Hao, Ang Li 0006, Tony Tong Geng. 1617-1631 [doi]
- PyPIM: Integrating Digital Processing-in-Memory from Microarchitectural Design to Python TensorsOrian Leitersdorf, Ronny Ronen, Shahar Kvatinsky. 1632-1647 [doi]
- Stream-Based Data Placement for Near-Data Processing with Extended MemoryYiwei Li 0004, Boyu Tian, Yi Ren, Mingyu Gao 0001. 1648-1662 [doi]
- Cambricon-M: A Fibonacci-Coded Charge-Domain SRAM-Based CIM Accelerator for DNN InferenceHongrui Guo, Mo Zou, Yifan Hao, Zidong Du, Erxiang Ren, Yang Liu, Yongwei Zhao, Tianrui Ma, Rui Zhang 0040, Xing Hu 0001, Fei Qiao, Zhiwei Xu 0002, Qi Guo 0001, Tianshi Chen 0002. 1663-1677 [doi]
- MeMCISA: Memristor-Enabled Memory-Centric Instruction-Set Architecture for Database WorkloadsYihang Zhu, Lei Cai, Lianfeng Yu, Anjunyi Fan, Longhao Yan, Zhaokun Jing, Bonan Yan, Pek Jun Tiw, Yuqi Li, Yaoyu Tao, Yuchao Yang 0001. 1678-1692 [doi]
- BABOL: A Software-Defined NAND Flash ControllerKibin Park, Alberto Lerner, Sangjin Lee 0001, Philippe Bonnet, Yong Ho Song, Philippe Cudré-Mauroux, Jungwook Choi. 1693-1705 [doi]