Abstract is missing.
- Understanding Diffusion Model Serving in Production: A Top-Down Analysis of Workload, Scheduling, and Resource EfficiencyYanying Lin, Shuaipeng Wu, Shutian Luo, Hong Xu 0001, Haiying Shen, Chong Ma, Min Shen, Le Chen, Chengzhong Xu 0001, Lin Qu, Kejiang Ye. 1-15 [doi]
- DFUSE: Strongly Consistent Write-Back Kernel Caching for Distributed Userspace File SystemsHaoyu Li, Jingkai Fu, Qing Li, Windsor Hsu, Asaf Cidon. 16-28 [doi]
- Confidential Analytics with ScyllaShamiek Mangipudi, Pavel Chuprikov, Gerald Prendi, Patrick Eugster. 29-44 [doi]
- Offloading Cloud-Native Infrastructure with XpuPodBicheng Yang, Jingkai He, Dong Du 0003, Yubin Xia, Haibo Chen 0001. 45-58 [doi]
- ParaLog: Consistent Host-side Logging for Parallel CheckpointsSteven W. D. Chien, Kento Sato, Artur Podobas, Niclas Jansson, Stefano Markidis, Michio Honda. 59-73 [doi]
- Rethinking Tiered Memory Management in Cloud Data CentersTong Xing, Jiaxun Yang, Javier Picorel, Antonio Barbalace. 74-87 [doi]
- Oneiros: KV Cache Optimization through Parameter Remapping for Multi-tenant LLM ServingRuihao Li 0002, Shagnik Pal, Vineeth Narayan Pullu, Prasoon Sinha, Jeeho Ryoo, Lizy K. John, Neeraja J. Yadwadkar. 88-101 [doi]
- Water Footprint of Datacenter Applications: Methodological Implications of Manufacturing, Operational, and Decommissioning PhasesAmit Samanta 0001, Yankai Jiang 0002, Ryan Stutsman, Rohan Basu Roy. 102-110 [doi]
- DyOrc: Efficient Serving of Dynamic Machine Learning WorkflowsShiwei Zhang 0002, Lansong Diao, Zisheng Meng, Siyu Wang, Wei Lin 0016, Chuan Wu 0001. 111-124 [doi]
- CPU-Limits kill Performance: Time to rethink Resource ControlChirag C. Shetty, Sarthak Chakraborty, Hubertus Franke, Larisa Shwartz, Chandra Narayanaswami, Indranil Gupta, Saurabh Jha. 125-133 [doi]
- Middlebox: Unlocking Datacenter Growth and Grid DecarbonizationLiuzixuan Lin, Andrew A. Chien. 134-148 [doi]
- Metis: A Non-Clairvoyant, Workflow-Aware OS Scheduler for Serverless ApplicationsWenda Tang, Yanan Yang, Jie Wu 0001. 149-162 [doi]
- Cloud-Native Digital Twin Orchestration for Real-Time Decision Optimization Using Fuzzy Constraints and Reinforcement LearningDavid Li, Angela Li. 163-169 [doi]
- DuoAdmit: Dual-Layer Cache Admission for Load-Balancing Hybrid-Redundancy Block StorageXiaojun Guo, Guangjie Xing, Hua Wang 0008, Ke Zhou 0001, Ming Xie, Fenqiang Yang, Min Fu, Bin Xu, Jianying Hu, Guangchao Yang. 170-182 [doi]
- PnM: Efficient Intra-Datacenter Calls Packing for Large Conferencing ServicesRohan Gandhi, Ankur Mallick. 183-195 [doi]
- THORN-ML: Transparent Hardware Offloaded Resilient Networks for RDMA based Distributed ML WorkloadsMaziyar Nazari, Daniel Noland, Giulio Sidoretti, Erika Hunhoff, Tamara Silbergleit Lehman, Eric Keller. 196-208 [doi]
- Funky: Cloud-Native FPGA Virtualization and OrchestrationAtsushi Koshiba, Charalampos Mainas, Pramod Bhatotia. 209-224 [doi]
- Multi-Agent Reinforcement Learning with Serverless ComputingRui Wei, Hanfei Yu, Xikang Song, Jian Li 0008, Devesh Tiwari, Ying Mao 0001, Hao Wang 0022. 225-239 [doi]
- ZipBatch: Multi-Tenant GPU Batching with Dual-Resource RegulationHaoxuan Yu, Sheng Yao, Wei Wang 0030. 240-254 [doi]
- WDP: Mitigating Interference in CPU Sharing Through Wake-up Delay Driven Preemption for QoS-aware Co-locationYaoxuan Li, Pu Pang, Yecheng Yang, Quan Chen 0002, Zhengxuan Yan, Guoyao Xu, Guodong Yang, Liping Zhang 0013, Minyi Guo. 255-268 [doi]
- Cost-Efficient Cloud Infrastructure with Hugepage-aware Memory DeduplicationRuizhe Huang, Xinyu Wang, Zhida An, Hanwen Lei, Peng Jiang 0007, Ziqi Zhang, Ding Li 0001, Yao Guo 0001, Xiangqun Chen, Yuntao Liu, Kang Zhou, Yuxin Ren 0001, Ning Jia, Xinwei Hu. 269-282 [doi]
- Snap & Replay: A new way to analyze uarch-scale performance bottlenecks for ML acceleratorsIoannis Zarkadas, Amanda Tomlinson, Asaf Cidon, Baris Kasikci, Ofir Weisse. 283-298 [doi]
- DRAM Failure Prediction with Correctable Error Spatial Patterns: A Hybrid Learning ApproachLei Liu, Yinling Zhang. 299-306 [doi]
- Scalable and Fault-Tolerant Storage and File System Services with Non-Blocking Synchronization for Private CloudsMincheol Sung, Ruslan Nikolaev 0001, Binoy Ravindran. 307-319 [doi]
- 10Cache: Heterogeneous Resource-Aware Tensor Caching and Migration for LLM TrainingSabiha Afroz, Redwan Ibne Seraj Khan, Hadeel Albahar, Jingoo Han, Ali Raza Butt. 320-333 [doi]
- Accelerating Distributed Filesystem Metadata Service via Decoupling Directory Semantics from Metadata IndexingWenhao Lv, Hao Guo, Qing Wang 0031, Youyou Lu, Jiwu Shu. 334-347 [doi]
- Orcas: A DAG-based Consensus Approach with Linear Communication OverheadYi-Hua, Xiulong Liu, Hao Xu, Chenyu Zhang, Gaowei Shi, Keqiu Li, Muhammad Shahzad 0001, Guyue (Grace) Liu. 348-360 [doi]
- AdaSpec: Adaptive Speculative Decoding for Fast, SLO-Aware Large Language Model ServingKaiyu Huang, Hao Wu, Zhubo Shi, Han Zou, Minchen Yu, Qingjiang Shi. 361-374 [doi]
- From Bottleneck to Breakthrough: Optimizing Scheduling for Hyperscale Containerized ClustersBing Li, Yuquan Ren, Xinyi Song, Zhilei Liu, Cong Xu, Jingyuan Zhang, Caixue Lin, Wu Xiang, Rui Shi. 375-387 [doi]
- GridGreen: Integrating Serverless Computing in HPC Systems for Performance and SustainabilityAmit Samanta 0001, Ryan Stutsman, Rohan Basu Roy. 388-401 [doi]
- Defragmentation Scheduling with Deep Reinforcement Learning in Shared GPU ClustersQingfu Wu, Pengfei Chen 0002, Yilun Wang 0001. 402-415 [doi]
- FailLite: Failure-Resilient Model Serving for Resource-Constrained Edge EnvironmentsLi Wu, Walid A. Hanafy, Tarek F. Abdelzaher, David E. Irwin 0001, Jesse Milzman, Prashant J. Shenoy. 416-429 [doi]
- PerfMon: Performance Monitoring of Host Network StackRanjitha K., Ankit Sharma, Malsawmsanga Sailo, Arun Siddardha, Amrit Kumar 0008, Praveen Tammana, Pravein Govindan Kannan, Priyanka Naik. 430-442 [doi]
- Serverless Elasticsearch: the Architecture Transformation from Stateful to StatelessIraklis Psaroudakis, Pooya Salehi, Jason Bryan, Francisco Fernández Castaño, Brendan Cully, Ankita Kumar, Henning Andersen, Thomas Repantis. 443-455 [doi]
- Revisiting State Machine Replication in Practice: Lessons from Building an etcd-inspired SystemLucas Lebow, Mason Dunkle, Christopher Siems, Jonathan Zarnstorff, Lewis Tseng. 456-463 [doi]
- Memory Matters: Load-Time Deduplication for UnikernelsGaulthier Gain, Benoit Knott, Cyril Soldani, Laurent Mathy. 464-478 [doi]
- CoRe: Collaborative Replica Scheduling for Large-Scale Cloud Database ServicesHongyu Lei, Shiyu Di, Chunhua Li 0002, Ke Zhou, Ming Xie, Fenqiang Yang, Jianping Zhu, Xiang Li, Kezhou Yan. 479-492 [doi]
- CoMPI: Coordinated Model Merging and Parallel Inference at EdgeShuang Zeng, Haitao Zhang, Zezhong Yan. 493-506 [doi]
- Scheduling Cloud VMs on Variable Capacity DatacentersRajini Wijayawardana, Andrew A. Chien. 507-520 [doi]
- FedDance: Efficient Participant Selection for Federated Learning in Highly Dynamic EnvironmentsYuanhang Chen, Xiaosong Chen, Wenyan Chen 0001, Huanle Xu. 521-534 [doi]
- Rethinking Web Cache Design for the AI EraYazhuo Zhang, Jinqing Cai, Avani Wildani, Ana Klimovic. 535-542 [doi]
- Valet: Efficient Data Placement on Modern SSDsDevashish R. Purandare, Peter Alvaro, Avani Wildani, Darrell D. E. Long, Ethan L. Miller. 543-556 [doi]
- Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU ClustersShruti Dongare, Redwan Ibne Seraj Khan, Hadeel Albahar, Nannan Zhao, Diego Meléndez-Maita, Ali Reza Butt. 557-570 [doi]
- FLASH: Fast Linked AF_XDP Sockets for High Performance Network Function ChainsDebojeet Das, Kevin Prafull Baua, Aditya Kansara, Arghyadip Chakraborty, Dheeraj Kurukunda, Mythili Vutukuru, Purushottam Kulkarni. 571-584 [doi]
- The case for synchronous distributed protocols in public cloudsNenad Milosevic, Robert Soulé, Fernando Pedone. 585-599 [doi]
- CIS: Checkpointed Inference for Data Drift-Resilient Model Serving at Edge ServersSudipta Saha Shubha, Haiying Shen, Ganesh Ananthanarayanan. 600-613 [doi]
- BLAFS: A Bloat-Aware Container File SystemHuaifeng Zhang, Mohannad Alhanahnah, Philipp Leitner 0001, Ahmed Ali-Eldin. 614-628 [doi]
- VLCs: Managing Parallelism with Virtualized LibrariesYineng Yan, William Ruys, Hochan Lee, Ian Henriksen, Arthur Peters, Sean Stephens, Bozhi You, Henrique Fingler, Martin Burtscher, Milos Gligoric 0001, Keshav Pingali, Mattan Erez, George Biros, Christopher J. Rossbach. 629-643 [doi]
- Hydra: Virtualized Multi-Language Runtime for High-Density Serverless PlatformsSerhii Ivanenko, Vasyl Lanko, Rudi Horn, Vojin Jovanovic, Rodrigo Bruno. 644-658 [doi]
- Nano-consensus: Ultra-fast, Quorum-less Coordination on the WireDavide Rovelli, Christian Faerber, Graham McKenzie, Ali Pahlevan, Sina Darabi, Patrick Jahnke, Patrick Eugster. 659-672 [doi]
- REEF: Energy-Efficient, Application-QoS-Aware Thread Processing in Oversubscribed Server EnvironmentsNing Li, Hong Jiang 0001, Hao Che, Zhijun Wang 0001. 673-686 [doi]
- Understanding GPU Resource Interference One Level DeeperPaul Elvinger, Foteini Strati, Natalie Enright Jerger, Ana Klimovic. 687-694 [doi]
- Spatio-Temporal Resource Control for Cloud-Native GPU ProvisioningHyeon Jun Jang, Sang-Jae Kim, Weikuan Yu, Hyun-Wook Jin. 695-707 [doi]
- A Fast, Efficient, and Strongly-Consistent Object StoreShuwen Sun, Isaac Khor, Ji-Yong Shin, Peter Desnoyers. 708-721 [doi]
- FedLTA: A Federated Long-Tail Alignment Framework via Global Class AnchorsYuzi Li, Zhigang Wang, Qinghua Zhang, Junfeng Zhao 0005. 722-734 [doi]
- Multiplexed Heterogeneous LLM Serving via Stage-Aligned ParallelismTao Luo, Kelvin K. W. Ng, Zhen Ping Khor, Sidharth Sankhe, Boon Thau Loo, Vincent Liu 0001. 735-747 [doi]
- SneakPeek: Data-Aware Model Selection and Scheduling for Inference Serving on the EdgeJoel Wolfrath, Daniel Frink, Abhishek Chandra. 748-761 [doi]
- PowerTrip: Exploiting Federated Heterogeneous Datacenter Power for Distributed ML TrainingTalha Mehboob, Luanzheng Guo, Nathan R. Tallent, Michael Zink, David Irwin. 762-775 [doi]
- Symbiosis: Multi-Adapter Inference and Fine-TuningSaransh Gupta, Umesh Deshpande, Travis Janssen, Swaminathan Sundararaman. 776-789 [doi]
- A Bootstrapping Technique for Reducing the Costs of Machine Learning Models for Predicting Execution Times in IaaS CloudsRomolo Marotta, Gabriele Russo Russo, Francesco Quaglia, Pierangelo di Sanzo. 790-802 [doi]
- FaaSGNN: Enabling Memory Efficient and Low Latency GNN Inference Services with Serverless ComputingYuzhuo Yang, Kaihua Fu, Quan Chen, Deze Zeng, Shuo Quan, Jie Wu 0001, Minyi Guo. 803-816 [doi]
- ModServe: Modality- and Stage-Aware Resource Disaggregation for Scalable Multimodal Model ServingHaoran Qiu, Anish Biswas, Zihan Zhao, Jayashree Mohan, Alind Khare, Esha Choukse, Íñigo Goiri, Zeyu Zhang 0005, Haiying Shen, Chetan Bansal, Ramachandran Ramjee, Rodrigo Fonseca. 817-830 [doi]
- ALAP: Intent-Based Serverless Computing via Delayed Decision-MakingPrasoon Sinha, Kostis Kaffes, Neeraja J. Yadwadkar. 831-846 [doi]
- Cuckoo: Deadline-Aware Job Packing on Heterogeneous GPUs for DL Model TrainingYuzheng Zhang, Renyu Yang, Junhong Liu, Weihan Jiang, Tianyu Ye, Yiqiao Liao, Penghao Zhang, Tiezi Zhang, Kun Shang, Tianyu Wo, Chunming Hu, Chengru Song, Jin Ouyang. 847-859 [doi]
- Towards a Lightweight Sidecar-based Service Mesh for ServerlessLazar Cvetkovic, Ana Klimovic. 860-866 [doi]
- Balancing Fairness and Performance in Multi-User Spark Workloads with Dynamic SchedulingDavis Kazemaks, Laurens Versluis, Burcu Kulahcioglu Ozkan, Jérémie Decouchant. 867-880 [doi]
- Cauchy: A Cost-Efficient LLM Serving System through Adaptive Heterogeneous DeploymentYihui Zhang, Han Shen, Renyu Yang, Di Tian, Yuxi Luo, Menghao Zhang 0001, Li Li 0029, Chunming Hu, Tianyu Wo, Chengru Song, Jin Ouyang. 881-893 [doi]