Abstract is missing.
- Intel ® in-Memory Analytics Accelerator: Performance Characterization and GuidelinesJaeyoung Kang 0004, Qirong Xia, Ipoom Jeong, Yongjoo Park, Nam Sung Kim. 1-13 [doi]
- Carbon-Aware Server ReplacementIris Uwizeyimana, Natalie Enright Jerger. 1-3 [doi]
- Exploring Constrained Dataflow Accelerators for Real-Time Multi-Task Multi-Model Ml WorkloadsJamin Seo, Jianming Tong, Tushar Krishna, Hyoukjun Kwon. 1-11 [doi]
- La Superba: Leveraging a Self-Comparison Method to Understand the Performance Benefits of Sparse Acceleration OptimizationsNebil Ozer, Gregory Kollmer, Ramyad Hadidi, Bahar Asgari. 1-12 [doi]
- FIDESlib: A Fully-Fledged Open-Source FHE Library for Efficient CKKS on GPUsCarlos Agulló-Domingo, Óscar Vera-López, Seyda Guzelhan, Lohit Daksha, Aymane El Jerari, Kaustubh Shivdikar, Rashmi S. Agrawal 0001, David R. Kaeli, Ajay Joshi, José L. Abellán. 1-3 [doi]
- Evaluating Compute in Memory Architectures for Matrix Multiplication: A Dataflow-Centric PerspectiveTanvi Sharma, Indranil Chakraborty, Mustafa Fayez Ali, Kaushik Roy 0001. 1-3 [doi]
- Dissecting Performance Overheads of Confidential Computing on GPU-based SystemsYang Yang, Mohammad Sonji, Adwait Jog. 1-16 [doi]
- ConCCL: Optimizing ML Concurrent Computation and Communication with GPU DMA EnginesAnirudha Agrawal, Shaizeen Aga, Suchita Pati, Mahzabeen Islam. 1-11 [doi]
- Evaluation and Comparison of the Energy Efficiency of Several Intel Multicore ProcessorsThomas Rauber, Gudula Rünger. 1-3 [doi]
- Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM WorkloadsRachid Karami, Sheng-Chun Kao, Hyoukjun Kwon. 1-14 [doi]
- An Analytical Cost Model for Fast Evaluation of Multiple Compute-Engine CNN AcceleratorsFareed Qararyah, Mohammad Ali Maleki, Pedro Trancoso. 1-13 [doi]
- MeMo: Enhancing Representative Sampling via Mechanistic Micro-Model SignaturesChenji Han, Huai Xu, Guangyao Guo, Yuxuan Wu, Fuxin Zhang. 1-13 [doi]
- ADOR: A Design Exploration Framework for LLM Serving with Enhanced Latency and ThroughputJunsoo Kim 0002, Hunjong Lee, Geonwoo Ko, Gyubin Choi, Seri Ham, Seongmin Hong, Joo-Young Kim 0001. 15-25 [doi]
- Generative AI in Embodied Systems: System-Level Analysis of Performance, Efficiency and ScalabilityZishen Wan, Jiayi Qian, Yuhang Du, Jason Jabbour, Yilun Du, Yang Zhao 0013, Arijit Raychowdhury, Tushar Krishna, Vijay Janapa Reddi. 26-37 [doi]
- Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled ArchitecturesPrabhu Vellaisamy, Thomas Labonte, Sourav Chakraborty 0007, Matt Turner, Samantika Sury, John Paul Shen. 49-61 [doi]
- Hierarchical Traversal Stack Design Using Shared Memory for GPU Ray TracingEunsoo Jung, Eunbi Jeong, Gunjae Koo, Yunho Oh, Myung Kuk Yoon. 62-72 [doi]
- RayFlex: An Open-Source RTL Implementation of the Hardware Ray Tracer DatapathFangjia Shen, Aaron Barnes, Anusuya Nallathambi, Timothy G. Rogers. 73-84 [doi]
- FinGraV: Methodology for Fine-Grain GPU Power Visibility and InsightsVarsha Singhania, Shaizeen Aga, Mohamed Assem Ibrahim. 96-107 [doi]
- Luthier: A Dynamic Binary Instrumentation Framework Targeting AMD GPUsMatin Raayai Ardakani, Andrew Nguyen, Ivan Rosales, Daoxuan Xu, Yuwei Sun, Yifan Sun, David Kaeli, Norman Rubin. 137-149 [doi]
- Performance Analysis of GEMM Workloads on the AMD Versal PlatformKaustubh Manohar Mhatre, Venkata Guru Prashanth Mulleti, Curt John Bansil, Endri Taka, Aman Arora 0001. 150-161 [doi]
- Beyond the Numbers: Measuring Android Performance Through User PerceptionJaeheon Lee, Juhyung Park, Seonggyun Oh, Jinhyung Koo, Sungjin Lee. 162-173 [doi]
- COCOSSim: A Cycle-Accurate Simulator for Heterogeneous Systolic Array ArchitecturesMansi Choudhary, Chris Kjellqvist, Jiaao Ma, Lisa Wu Wills. 174-185 [doi]
- SCALE-Sim V3: a Modular Cycle-Accurate Systolic Accelerator Simulator for End-To-End System AnalysisRitik Raj, Sarbartha Banerjee, Nikhil Chandra, Zishen Wan, Jianming Tong, Ananda Samajdhar, Tushar Krishna. 186-200 [doi]
- Evaluation of MindPalace for Chip Design Tradeoffs on Function-as-a-ServiceKaifeng Xu, Georgios Tziantzioulis, David Wentzlaff. 201-212 [doi]
- PowerSensor3: A Fast and Accurate Open Source Power Measurement ToolSteven van der Vlugt, Leon C. Oostrum, Gijs Schoonderbeek, Ben van Werkhoven, Bram Veenboer, Krijn Doekemeijer, John W. Romein. 213-226 [doi]
- Benchmarking 3D Gaussian Splatting RenderingSaichand Samudrala, Sushant Kondguli, Paul Gratz. 227-238 [doi]
- COSMOS: An LLC Contention Slowdown Model for Heterogeneous Multi-Core SystemsYongju Lee 0003, Jaewon Kwon, Cheolhwan Kim, Enhyeok Jang, Jiwon Lee, Hyunwuk Lee, Won Woo Ro. 264-275 [doi]
- Use Equal-Work or Equal-Time Speedup, Not Geomean SpeedupLieven Eeckhout. 276-285 [doi]
- Identifying Important Data Transformations for Synthesizing Effective Lossless CompressorsNoushin Azami, Martin Burtscher. 286-296 [doi]
- Beethoven: A Heterogeneous Multi-Core Accelerator System ComposerChris Kjellqvist, Brendan Peercy, Alvin R. Lebeck, Lisa Wu Wills. 297-308 [doi]
- SAGA: A Surrogate Assisted Genetic Algorithm for Fast CPU Power Virus GenerationPanteleimonas Chatzimiltis, Georgia Antoniou, Haris Volos 0001, Yiannakis Sazeides. 309-319 [doi]
- Concurrent PIM and Load/Store Servicing in PIM-Enabled MemorySudhanshu Gupta 0002, Niti Madan, Sooraj Puthoor, Nuwan Jayasena, Sandhya Dwarkadas. 320-334 [doi]
- The Fake-Busy and True-Idle Problems of Running Graph Applications on Chiplet-Based Multi-CoresRashid Aligholipour, Yuan Yao. 347-349 [doi]
- The Future of Instruction-Level Parallelism (ILP)Alexandra W. Chadwick, Márton Erdos, Utpal Bora 0003, Akshay Bhosale, Bob Lytton, Yuxin Guo, Richard Cooper, Giacomo Gabrielli, Timothy M. Jones 0001. 350-352 [doi]
- Characterizing Compute-Communication Overlap in GPU-Accelerated Distributed Deep Learning: Performance and Power ImplicationsSeonho Lee, Jihwan Oh, Seokjin Go, Divya Mahajan 0001. 353-355 [doi]
- PIM-BEACON: A Benchmarking and Emulation Framework Supporting Adaptive CONfigurations in DRAM-Based Processing-in-Memory SystemsInseong Hwang, Jihoon Jang 0001, Chaewon Park, Hyun Kim 0001. 356-358 [doi]
- Profiling Concurrent Vision Inference Workloads on NVIDIA JetsonAbhinaba Chakraborty, Wouter Tavernier, Akis Kourtis, Mario Pickavet, Andreas Oikonomakis, Didier Colle. 359-361 [doi]
- ASLink: Modeling Multi-GPU Execution in Accel-SimChristin Bose, Cesar Avalos, Junrui Pan, Yechen Liu, Mahmoud Khairy, Clay Hughes, Timothy G. Rogers. 362-364 [doi]
- A Flexible and Accurate Circuit-Level Substrate for Future DRAM Design and AnalysisS. M. Mojahidul Ahsan, Mohammad Nouri, Ramesh Reddy Ganapam, Mohammad Alian, Tamzidul Hoque. 371-373 [doi]
- TPNM: A CXL Based General Purpose Tiered Process Near Memory FrameworkPingyi Huo, Anusha Devulapally, Hasan Al Maruf, Meena Arunachalam, Mahmut Taylan Kandemir, Vijaykrishnan Narayanan. 374-376 [doi]
- A Real-Time, Auto-Regression Method for in-Situ Feature Extraction in Hydrodynamics SimulationsKewei Yan, Yonghong Yan 0001. 377-378 [doi]
- Library of Networks: An Online Tool for Design and Analysis of Network TopologiesAniket Chatterjee, Conor James Green, Mithuna Thottethodi. 379-381 [doi]
- Multi-Core Aware Evaluation of PrefetchersMartí Torrents, Paul Caheny, Stijn Eyerman, Wim Heirman. 382-384 [doi]
- Analysis of the RISC-V Vector Extension for Vulkan Graphics KernelsMartin Troiber, Martin Schulz 0001, Blaise Tine, Hyesoon Kim. 388-389 [doi]
- GPU Simulation Acceleration via ParallelizationRodrigo Huerta, Antonio González 0001. 390-392 [doi]
- Energon: A Sustainability-Driven Modeling Framework for AI Data CentersWenzhe Guo, Joyjit Kundu, Uras Tos, Giuliano Sisto, Cedric Rolin, Lars-Åke Ragnarsson, Timon Evenblij. 393-395 [doi]
- Measuring Performance Overheads of Software Memory Management Using Functional-First SimulatorsYves Vandriessche, Wim Heirman, Ed Nutting, Jeremy Birch, Judah Daniels, Mae Hood, Pascal Costanza. 399-400 [doi]
- Interconnect Performance Estimation for ML Accelerators via Lightweight Analytical ModelRahul Tripathy, Sumit K. Mandal. 401-403 [doi]