Abstract is missing.
- CRISP: Concurrent Rendering and Compute Simulation Platform for GPUsJunrui Pan, Timothy G. Rogers. 1-14 [doi]
- LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at ScaleJaehong Cho, Minsu Kim, HyunMin Choi, Guseul Heo, Jongse Park. 15-29 [doi]
- Lotus: Characterization of Machine Learning Preprocessing Pipelines via Framework and Hardware ProfilingRajveer Bachkaniwala, Harshith Lanka, Kexin Rong 0001, Ada Gavrilovska. 30-43 [doi]
- Mediator: Characterizing and Optimizing Multi-DNN Inference for Energy Efficient Edge IntelligenceSeung Hun Choi, Myung Jae Chung, Young-geun Kim, Sung Woo Chung. 44-56 [doi]
- Performance Modeling and Workload Analysis of Distributed Large Language Model Training and InferenceJoyjit Kundu, Wenzhe Guo, Ali BanaGozar, Udari De Alwis, Sourav Sengupta 0001, Puneet Gupta, Arindam Mallik. 57-67 [doi]
- CARM Tool: Cache-Aware Roofline Model Automatic Benchmarking and Application AnalysisJosé Morgado, Leonel Sousa, Aleksandar Ilic. 68-81 [doi]
- SHARP: A Distribution-Based Framework for Reproducible Performance EvaluationViyom Mittal, Pedro Bruel, Michalis Faloutsos, Dejan S. Milojicic, Eitan Frachtenberg. 82-93 [doi]
- Taming Performance Variability caused by Client-Side Hardware ConfigurationGeorgia Antoniou, Haris Volos 0001, Yiannakis Sazeides. 94-107 [doi]
- HEX-SIM: Evaluating Multi-modal Large Language Models on Multi-chiplet NPUsXinquan Lin, Haobo Xu, Yinhe Han 0001, Yiming Gan. 108-120 [doi]
- Empowering the Quantum Cloud User with QRIOShmeelok Chakraborty, Yuewen Hou, Ang Chen, Gokul Subramanian Ravi. 121-131 [doi]
- Evergreen: Comprehensive Carbon Model for Performance-Emission TradeoffsTersiteab Adem, Andrew McCrabb, Vidushi Goyal, Valeria Bertacco. 132-143 [doi]
- Performance Analysis of Zero-Knowledge ProofsSaichand Samudrala, Jiawen Wu, Chen Chen, Haoxuan Shan, Jonathan Ku, Yiran Chen 0001, Jeyavijayan Rajendran. 144-155 [doi]
- VelociTI: An Architecture-level Performance Modeling Framework for Trapped Ion Quantum ComputersAlexander Hankin, Abdulrahman Mahmoud, Mark Hempstead, David Brooks 0001, Gu-Yeon Wei. 156-168 [doi]
- Understanding Performance Implications of LLM Inference on CPUsSeonjin Na, Geonhwa Jeong, Byung Hoon Ahn, Jeffrey Young 0001, Tushar Krishna, Hyesoon Kim. 169-180 [doi]
- Low-Bitwidth Floating Point Quantization for Efficient High-Quality Diffusion ModelsCheng Chen, Christina Giannoula, Andreas Moshovos. 181-193 [doi]
- Characterizing the Accuracy-Efficiency Trade-off of Low-rank Decomposition in Language ModelsChakshu Moar, Faraz Tahmasebi, Michael Pellauer, Hyoukjun Kwon. 194-209 [doi]
- Understanding the Performance and Estimating the Cost of LLM Fine-TuningYuchen Xia, Jiho Kim, Yuhan Chen, Haojie Ye, Souvik Kundu 0002, Cong Callie Hao, Nishil Talati. 210-223 [doi]
- Characterizing and Optimizing the End-to-End Performance of Multi-Agent Reinforcement Learning SystemsKailash Gogineni, Yongsheng Mei, Karthikeya Gogineni, Peng Wei, Tian Lan 0001, Guru Venkataramani. 224-235 [doi]
- Understanding Address Translation Scaling Behaviours Using Hardware Performance CountersNick Lindsay, Abhishek Bhattacharjee. 236-246 [doi]
- Architectural Modeling and Benchmarking for Digital DRAM PIMFarzana Ahmed Siddique, Deyuan Guo, Zhenxing Fan, MohammadHosein Gholamrezaei, Morteza Baradaran, Alif Ahmed, Hugo Abbot, Kyle Durrer, Kumaresh Nandagopal, Ethan Ermovick, Khyati Kiyawat, Beenish Gul, Abdullah Mughrabi, Ashish Venkat, Kevin Skadron. 247-261 [doi]
- Kindle: A Comprehensive Framework for Exploring OS-Architecture Interplay in Hybrid Memory SystemsK. P. Arun 0002, Debadatta Mishra. 262-272 [doi]
- Enhanced System-Level Coherence for Heterogeneous Unified Memory ArchitecturesAnoop Mysore Nataraja, Ricardo Fernández Pascual, Alberto Ros 0001. 273-283 [doi]
- Characterizing Emerging Page Replacement Policies for Memory-Intensive ApplicationsMichael Wu, Sibren Isaacman, Abhishek Bhattacharjee. 284-294 [doi]
- Characterizing CUDA and OpenMP Synchronization PrimitivesBrandon Alexander Burtchell, Martin Burtscher. 295-308 [doi]
- Evaluating Performance and Energy Efficiency of Parallel Programming Models in Heterogeneous Computing SystemsDemirhan Sevim, Baturalp Bilgin, Ismail Akturk. 309-319 [doi]
- Performance Impact of Removing Data Races from GPU Graph Analytics ProgramsYiqian Liu, Avery Vanausdal, Martin Burtscher. 320-331 [doi]