Abstract is missing.
- Message from the 2025 General Co-chairsMarco D. Santambrogio, Ananth Kalyanaraman. [doi]
- Longer Attention Span: Increasing Transformer Context Length With Sparse Graph Processing TechniquesNathaniel Tomczak, Sanmukh Kuppannagari. 1-12 [doi]
- Air-FedGA: A Grouping Asynchronous Federated Learning Mechanism Exploiting Over-The-Air ComputationQianpiao Ma, Junlong Zhou, Xiangpeng Hou, JianChun Liu, Hongli Xu, Jianeng Miao, Qingmin Jia. 1-12 [doi]
- 3 Forecast: Personalized Privacy-Preserving Cloud Workload Prediction Based on Federated Generative Adversarial NetworksYu Kuang, Li Yan 0004, Zhuozhao Li. 1-11 [doi]
- Be Aware of Metadata Corruption in Parallel File System: It can be Silent and CatastrophicSaisha Kamat, Mai Zheng, Bo Fang, Dong Dai 0001. 1-13 [doi]
- GuardianOMP: A Framework for Highly Productive Fault Tolerance Via OpenMP Task-Level ReplicationAdrian Munera, Eduardo Quiñones, Sara Royuela. 1-12 [doi]
- VerifyIO: Verifying Adherence to Parallel I/O Consistency SemanticsChen Wang, Zhaobin Zhu, Kathryn M. Mohror, Sarah Neuwirth, Marc Snir. 1-12 [doi]
- For What the Bell TollsDavid E. Keyes. 1 [doi]
- A Memory-Efficient and Computation-Balanced Lossy Compressor on Wafer-Scale EngineShihui Song, Robert Underwood, Sheng Di, Yafan Huang, Peng Jiang 0004, Franck Cappello. 1-13 [doi]
- TOSS: Tiering of Serverless Snapshots for Memory-Efficient Serverless ComputingTheodore Michailidis, Juno Kim, Linsong Guo, Steven Swanson, Jishen Zhao. 2-14 [doi]
- Ekko: Fully Decentralized Scheduling for Serverless Edge ComputingXin Chen, Manoj Prabhakar Paidiparthy, Dilma Da Silva, Liting Hu. 15-27 [doi]
- It Takes Two to Tango: Serverless Workflow Serving via Bilaterally Engaged Resource AdaptationJing Wu, Lin Wang, Quanfeng Deng, Chen Yu 0003, Dong Zhang, Bingheng Yan, Fangming Liu. 28-41 [doi]
- Tide: A Distributed Runtime Management Framework for Things-Edge-Cloud Computing ContinuumXiaohui Peng, Wenkai Yan, Yifan Wang, Shoujian Zheng, Zhiwei Xu. 42-53 [doi]
- PISA: An Adversarial Approach to Comparing Task Graph Scheduling AlgorithmsJared Ray Coleman, Bhaskar Krishnamachari. 54-66 [doi]
- Enhancing OmpSs-2 Suspendable Tasks by Combining Operating System and User-Level Threads with C++ CoroutinesArnau Cinca, Aleix Roca, Kevin Sala, Raúl Peñacoba Veigas, David Álvarez 0006, Vicenç Beltran 0001. 67-80 [doi]
- Optimizing Fine-Grained Parallelism Through Dynamic Load Balancing on Multi-Socket Many-Core SystemsWenyi Wang, Maxime Gonthier, Poornima Nookala, Haochen Pan, Ian T. Foster, Ioan Raicu, Kyle Chard. 81-93 [doi]
- CALock: Multi-Granularity Locking in Dynamic HierarchiesAyush Pandey, Julien Sopena, Marc Shapiro 0001, Swan Dubois. 94-105 [doi]
- Adapt-S: Effective DNN Pruning via Unified Accuracy and Performance TuningRoberto L. Castro, Diego Andrade, Basilio B. Fraguela. 106-117 [doi]
- An Efficient Adaptive Dual-Threshold Svm Based on Heterogeneous CollaborationXing Peng, Qinglin Wang, Chuhe Hong, Gencheng Liu, Rui Xia, Xinhai Chen, ZhiGang Sun, Jie Liu 0002. 118-129 [doi]
- Accelerating Tensor-Train Decomposition on Graph Neural NetworksShenghao Qiu, Chunwei Xia, Zheng Wang 0001. 130-141 [doi]
- Energy-Optimal and Low-Depth Algorithmic Primitives for Spatial Dataflow ArchitecturesLukas Gianinazzi, Tal Ben-Nun, Maciej Besta, Saleh Ashkboos, Yves Baumann, Piotr Luczynski, Torsten Hoefler. 142-153 [doi]
- AQUA: Hardware-Agnostic Qubit Allocation for Quantum Multi-ProgrammingXinYu Piao, Jooyong Shim, Joongheon Kim, Jong-Kook Kim. 154-161 [doi]
- Distributed Construction of Demand-Aware Datacenter NetworksAleksander Figiel, Darya Melnyk, Tijana Milentijevic, Stefan Schmid 0001. 162-172 [doi]
- PivotScale: A Holistic Approach for Scalable Clique CountingAmogh Lonkar, Scott Beamer. 173-186 [doi]
- Less is More: Faster Maximum Clique Search by Work-AvoidanceHans Vandierendonck. 187-198 [doi]
- ALGAS: A Low-Latency GPU-Based Approximate Nearest Neighbor Search SystemYuanhui Chen, Lixiao Cui, Zebin Yao, Hao Zhou, Gang Wang, Xiaoguang Liu 0001. 199-209 [doi]
- AI and HPC Applications on Leadership Computing Platforms: Performance and Scalability StudiesJaeHyuk Kwack, Colleen Bertoni, Umesh Unnikrishnan, Riccardo Balin, Khalid Hossain, Yasaman Ghadar, Timothy J. Williams, Abhishek Bagusetty, Mathialakan Thavappiragasam, Väinö Hatanpää, Archit Vasan, John R. Tramm, Scott Parker. 210-222 [doi]
- Accelerate Coastal Ocean Circulation Model with AI SurrogateZelin Xu 0001, Jie Ren, Yupu Zhang, Jose Maria Gonzalez Ondina, Maitane Olabarrieta, Tingsong Xiao, Wenchong He, Zibo Liu, Shigang Chen, Kaleb Smith, Zhe Jiang 0001. 223-235 [doi]
- FastCHGNet: Training One Universal Interatomic Potential to 1.5 Hours with 32 GPUsYuanchang Zhou, Siyu Hu, Chen Wang, Lin-Wang Wang, Guangming Tan, Weile Jia. 236-246 [doi]
- Compiler, Runtime, and Hardware Parameters Design Space ExplorationLana Scravaglieri, Ani Anciaux-Sedrakian, Olivier Aumage, Thomas Guignon, Mihail Popov. 247-260 [doi]
- Performance Projection for Design-Space Exploration on future HPC ArchitecturesClément Gavoille, Hugo Taboada, Jens Domke, Brice Goglin, Emmanuel Jeannot. 261-272 [doi]
- PALLAS: A Generic Trace Format for Large HPC Trace AnalysisCatherine Guelque, Valentin Honoré, Philippe Swartvagher, Gaël Thomas 0001, François Trahay. 273-284 [doi]
- Tera-Scale Multilevel Graph PartitioningDaniel Salwasser, Daniel Seemaier, Lars Gottesbüren, Peter Sanders 0001. 285-296 [doi]
- A Bidirectional GPU Algorithm for Computing Maximum Matchings in Bipartite GraphsAnju Mongandampulath Akathoott, Martin Burtscher. 297-308 [doi]
- Edge-Disjoint Spanning Trees on Star ProductsKelly Isham, Laura Monroe, Kartik Lakhotia, Aleyah Dawkins, Daniel Hwang, Ales Kubicek. 309-321 [doi]
- IOAgent: Democratizing Trustworthy HPC I/O Performance Diagnosis Capability via LLMsChris Egersdoerfer, Arnav Sareen, Jean Luca Bez, Suren Byna, Dongkuan Xu, Dong Dai. 322-334 [doi]
- DeepBAT: Performance and Cost Optimization of Serverless Inference Using TransformersBowen Sun, Riccardo Pinciroli, Giuliano Casale, Evgenia Smirni. 335-346 [doi]
- FlexRLHF: A Flexible Placement and Parallelism Framework for Efficient RLHF TrainingYoushao Xiao, Zhenglei Zhou, Fagui Mao, Weichang Wu, Shangchun Zhao, Lin Ju, Lei Liang, Xiaolu Zhang, Jun Zhou. 358-369 [doi]
- Enabling Efficient Error-Controlled Lossy Compression for Unstructured Scientific DataXuan Wu, Sheng Di, Congrong Ren, Pu Jiao, Mingze Xia, Cheng Wang, Hanqi Guo, Xin Liang 0001, Franck Cappello. 370-382 [doi]
- PolyMorphous: An MLIR-Based Polyhedral Compiler with Loop Transformation PrimitivesJinman Zhao, Seyed Aryan Vahabpour, Xingyu Yue, Kai-Ting Amy Wang, Tarek S. Abdelrahman. 383-394 [doi]
- Parallel Scheduling of Task Graphs with Minimal Memory RequirementsPascal Fradet, Alain Girault, Alexandre Honorat. 395-406 [doi]
- The Artificial Scientist: in-Transit Machine Learning of Plasma SimulationsJeffrey Kelling, Vicente Bolea, Michael Bussmann, Ankush Checkervarty, Alexander Debus, Jan Ebert, Greg Eisenhauer, Vineeth Gutta, Stefan Kesselheim, Scott Klasky, Vedhas Pandit, Richard Pausch, Norbert Podhorszki, Franz Pöschel, David Rogers, Jeyhun Rustamov, Steve Schmerler, Ulrich Schramm, Klaus Steiniger, René Widera, Anna Willmann, Sunita Chandrasekaran. 407-418 [doi]
- The Power of Parallelism: Accelerating Discovery in the BiosciencesSrinivas Aluru. 419 [doi]
- FlowForecaster: Automatically Inferring Detailed & Interpretable Workflow Scaling Models for ForecastsHyungro Lee, Jesun Firoz, Nathan R. Tallent, Luanzheng Guo, Mahantesh Halappanavar. 420-432 [doi]
- Cello: Co-Designing Schedule and Hybrid Implicit/Explicit Buffer for Complex Tensor ReuseRaveesh Garg, Michael Pellauer, Sivasankaran Rajamanickam, Tushar Krishna. 433-446 [doi]
- Locality Aware Process Remapping for Distributed-Memory Graph WorkloadsMd Nahid Newaz, Sayan Ghosh, Nathan R. Tallent, Guangzhi Qu. 447-459 [doi]
- A Work-Optimal Parallel Algorithm for Aligning Sequences to Genome GraphsAranya Banerjee, Daniel Gibney, Helen Xu, Srinivas Aluru. 460-471 [doi]
- An Asynchronous Distributed-Memory Parallel Algorithm for $k$-Mer CountingSouvadra Hati, Akihiro Hayashi, Richard W. Vuduc. 472-483 [doi]
- Pandemics in Silico: Scaling Agent-Based Simulations on Realistic Social Contact NetworksJoy Kitson, Ian J. Costello, Jiangzhuo Chen, Diego Jiménez, Stefan Hoops, Henning S. Mortveit, Esteban Meneses, Jae-Seung Yeom, Madhav V. Marathe, Abhinav Bhatele. 484-496 [doi]
- SEAFL: Enhancing Efficiency in Semi-Asynchronous Federated Learning Through Adaptive Aggregation and Selective TrainingMd Sirajul Islam, Sanjeev Panta, Fei Xu, Xu Yuan 0001, Li Chen 0019, Nian-Feng Tzeng. 509-519 [doi]
- IP-FL: Incentive-Driven Personalization in Federated LearningAhmad Faraz Khan 0001, Xinran Wang, Qi Le, Zain ul Abdeen, Azal Ahmad Khan, Haider Ali, Ming Jin 0002, Jie Ding 0002, Ali Raza Butt, Ali Anwar 0001. 520-532 [doi]
- Leveraging Compilation Statistics for Compiler Phase OrderingJiayu Zhao, Chunwei Xia, Zheng Wang. 533-545 [doi]
- PCEBench: A Multi-Dimensional Benchmark for Evaluating Large Language Models in Parallel Code GenerationLe Chen, Nesreen K. Ahmed, Mihai Capota, Ted Willke, Niranjan Hasabnis, Ali Jannesari. 546-557 [doi]
- Gensor: A Graph-Based Construction Tensor Compilation Method for Deep LearningHangda Liu, Boyu Diao, Yu Yang, Wenxin Chen, Xiaohui Peng 0002, Yongjun Xu 0001. 558-569 [doi]
- 2: Scalable, Parallel, and Real-Time fMRI Data Analysis on Heterogeneous ArchitecturesWeicong Chen, Sarah J. Carr, Jing Zhang, Curtis Tatsuoka, Xiaoyi Lu. 570-581 [doi]
- The Tensor-Core Beamformer: A High-Speed Signal-Processing Library for Multidisciplinary UseLeon C. Oostrum, Bram Veenboer, Ronald Rook, Michael Brown, Pieter Kruizinga, John W. Romein. 582-592 [doi]
- Parallel-in-Time Kalman Smoothing Using Orthogonal TransformationsShahaf Gargir, Sivan Toledo. 593-604 [doi]
- FLAME: Federated Learning for Attack Mitigation and EvasionDiletta Chiaro, Pian Qi, Edoardo Prezioso, Antonella Guzzo, Francesco Piccialli. 605-615 [doi]
- Hybrid-Granularity Parallelism Support for Fast Transaction Processing in Blockchain-Based Federated LearningMulin Li, Zhaolong Jian, Kaixuan Yang, Xueshuo Xie, Wajdy Othman, Tao Li. 616-628 [doi]
- Pair-Then-Aggregate: Simplified and Efficient Parallel Programming Paradigm for Secure Multi-Party ComputationXiaoyu Fan, Kun Chen 0004, Guosai Wang, Xiaowei Zhu, Haoqing He, Xie Yong, Xiaofeng Jia, Yidong Li, Wei Xu 0005. 629-640 [doi]
- LaOvl: Lifecycle-Aware Overlay File System for Efficient Container I/O in Cloud ComputingZhuo-yuan, Haopeng Chen, Yucheng Tao, Zihong Lin. 654-665 [doi]
- KVACCEL: A Novel Write Accelerator for LSM-Tree-Based KV Stores with Host-SSD CollaborationKihwan Kim, Hyunsun Chung, Seonghoon Ahn, Junhyeok Park 0002, Safdar Jamil, Hongsu Byun, Myungcheol Lee, JinChun Choi, Youngjae Kim 0001. 666-677 [doi]
- Accelerating the Dutch Atmospheric Large-Eddy Simulation (DALES) Model with OpenACCLucas Esclapez, Laurent Soucasse, Caspar Jungbacker, Fredrik Jansson, Stephan R. de Roode, Pedro Costa, Gijs van den Oord, Alessio Sclocco. 678-688 [doi]
- Automated MPI-X Code Generation for Scalable Finite-Difference SolversGeorge Bisbas, Rhodri Nelson, Mathias Louboutin, Fabio Luporini, Paul H. J. Kelly, Gerard Gorman. 689-701 [doi]
- A GPU-Accelerated Distributed Algorithm for Optimal Power Flow in Distribution SystemsMinseok Ryu, Geunyeong Byeon, Kibaek Kim. 702-711 [doi]
- PredTOP: Latency Predictor Utilizing DAG Transformers for Distributed Deep Learning Training with Operator ParallelismDipak Acharya, Tong Shu. 712-724 [doi]
- Reducing the End-to-End Latency of DNN-Based Recommendation Systems in GPU PoolsGuangqiang Luan, Pu Pang, Quan Chen 0002, Chen Chen 0067, Guoyao Xu, Chi Zhang, Yanyi Zi, Yinghao Yu, Guodong Yang, Liping Zhang, Minyi Guo. 725-736 [doi]
- Improving Accuracy and Efficiency of Graph Embedding Training with Fine-Grained Parameter ManagementLihan Hu, Peng Jiang 0004. 737-748 [doi]
- A Deep Look into the Temporal I/O Behavior of HPC ApplicationsFrancieli Boito, Luan Teylo, Mihail Popov, Théo Jolivel, François Tessier, Jakob Lüttgau, Julien Monniot, Ahmad Tarraf, André Ramos Carneiro, Carla Osthoff. 749-762 [doi]
- AdapTBF: Decentralized Bandwidth Control via Adaptive Token Borrowing for HPC StorageMd Hasanur Rashid, Dong Dai. 775-788 [doi]
- A New Spin on the Fast Multipole Method for GPUS: Rethinking the Far-Field OperatorsArijus Lengvenis, Holger Dachsel, Laura Morgenstern, Ivo Kabadshow. 789-800 [doi]
- Large Scale Finite-Temperature Real-Time Time Dependent Density Functional Theory Calculation with Hybrid Functional on ARM and GPU SystemsRongrong Liu, Zhuoqiang Guo, Qiuchen Sha, Tong Zhao, Haibo Li, Wei Hu 0006, Lijun Liu, Guangming Tan, Weile Jia. 801-812 [doi]
- Improving Parallel Scalability for Molecular Dynamics Simulations in the Exascale EraBrian Dandurand, Hans Vandierendonck, Bronis R. de Supinski. 813-823 [doi]
- Phase-Based Frequency Scaling for Energy-Efficient Heterogeneous ComputingLorenzo Carpentieri, Antonio De Caro, Majid Salimi Beni, Kaijie Fan, Biagio Cosenza. 824-836 [doi]
- GNNPerf: Towards Effective Performance Profiling and Analysis Across GNN FrameworksKejie Ma, Hailong Yang, Zizheng Zhang, Xin You, Zhibo Xuan, Qingxiao Sun, Zhongzhi Luan, Yi Liu, Depei Qian. 837-849 [doi]
- Graph Neural Network-Based Latency Prediction for Stream Processing TaskZheng Chu, Ren Hang Zhang, Baozhu Li, Changtian Ying, Weiyun Li. 850-859 [doi]
- Next-gen Infrastructure for Scalable Generative AI: Focus on Advances in Storage, Computing and OrchestrationRobert Haas. 860 [doi]
- To Compress or Not to Compress: Energy Trade-Offs and Benefits of Lossy Compressed I/OGrant Wilkins, Sheng Di, Jon C. Calhoun, Robert Underwood, Franck Cappello. 861-873 [doi]
- Fast and Effective Lossy Compression on GPUs and CPUs with Guaranteed Error BoundsAlex Fallin, Noushin Azami, Sheng Di, Franck Cappello, Martin Burtscher. 874-887 [doi]
- BRP-SpMM: Block-Row Partition Based Sparse Matrix Multiplication with Tensor and CUDA CoresYukang Dong, Wenbin Jiang 0001, Xinhai Shen, Haihong Guo, Zhiyuan Shao, Hai Jin 0001. 901-912 [doi]
- Graph Input-Aware Matrix Multiplication for Pruned Graph Neural Network AccelerationHanan Khan, Deniz Gurevin, Omer Khan. 913-925 [doi]
- NM-SpMM: Accelerating Matrix Multiplication Using N: M Sparsity with GPGPUCong Ma, Du Wu, Zhelang Deng, Jiang Chen, Xiaowen Huang, Jintao Meng 0001, Wenxi Zhu, Bingqiang Wang, Amelie Chi Zhou, Peng Chen, Minwen Deng, Yanjie Wei, Shengzhong Feng, Yi Pan. 926-937 [doi]
- Unified Designs of Multi-Rail-Aware MPI Allreduce and Alltoall Operations Across Diverse GPU and Interconnect SystemsChen-Chun Chen, Jinghan Yao, Lang Xu, Hari Subramoni, Dhabaleswar K. Panda 0001. 938-949 [doi]
- HiCCL: A Hierarchical Collective Communication LibraryMert Hidayetoglu, Simon Garcia De Gonzalo, Elliott Slaughter, Pinku Surana, Wen-mei W. Hwu, William Gropp, Alex Aiken. 950-961 [doi]
- NBLFQ: A Lock-Free MPMC Queue Optimized for Low ContentionAlexandre Denis 0001, Charles Goedefroit. 962-973 [doi]
- Improving the Efficiency of Interpolation-based Scientific Data Compressors with Adaptive Quantization Index PredictionPu Jiao, Sheng Di, Mingze Xia, Xuan Wu, Jinyang Liu 0003, Xin Liang 0001, Franck Cappello. 974-986 [doi]
- An Adaptive Two-Stage Algorithm for Error-Bounded Scientific Data CompressionRoberto Nuca, Matteo Parsani, George Turkiyyah. 987-997 [doi]
- Achieving Better Benefits via Flexible Feature Matching in Post-Deduplication Delta CompressionFengkui Yang, Bo Mao, Yuhan Liu, Liang Bao, Weipeng Jiang, Dongying Zhang, Chunhua Li 0002, Ke Zhou 0001. 998-1010 [doi]
- Scalable and Portable LU Factorization with Partial Pivoting on Top of Runtime SystemsAlycia Lisito, Mathieu Faverge, Matthieu Kuhn, Florent Pruvost, Pierre Ramet. 1011-1022 [doi]
- Accelerating Sparse Linear Solvers on Intelligence Processing UnitsTim Noack, Louis Krüger, Andreas Koch. 1023-1035 [doi]
- Adaptive s-Step GMRES with Randomized and Truncated Low-Synchronization OrthogonalizationRobert Ernstbrunner, Wilfried N. Gansterer. 1036-1047 [doi]
- Performance Characterization of CXL Memory and Its Use CasesXi Wang, Jie Liu, Jianbo Wu, Shuangyan Yang, Jie Ren, Bhanu Shankar, Dong Li. 1048-1061 [doi]
- RXT: RefleXive Address Translation for Pointer-Chasing WorkloadsRashid Aligholipour, Pavlos Aimoniotis, Stefanos Kaxiras, Yuan Yao. 1062-1073 [doi]
- CoRD: Converged RDMA DataplaneMaksym Planeta, Jan Bierbaum, Michael Roitzsch, Hermann Härtig. 1074-1090 [doi]
- Accelerating Graph Neural Networks Using a Novel Computation-Friendly Matrix Compression FormatJoão Nuno Ferreira Alves, Samir Moustafa, Siegfried Benkner, Alexandre P. Francisco, Wilfried N. Gansterer, Luís M. S. Russo. 1091-1103 [doi]
- HPDR: High-Performance Portable Scientific Data Reduction FrameworkJieyang Chen, Qian Gong, Yanliang Li, Xin Liang 0001, Lipeng Wan 0001, Qing Liu 0002, Norbert Podhorszki, Scott Klasky. 1104-1116 [doi]
- Sensitivity and Impacts on Parallel Compression of Prediction of Lossy Compression Ratios for Scientific DataAlexandra Poulos, Robert Underwood, Jon C. Calhoun, Sheng Di, Franck Cappello. 1117-1128 [doi]
- Taijigraph: an Out-Of-Core Graph Processing System Enhanced with Computational StorageXinmiao Zhang 0004, Cheng Liu, Shengwen Liang, Hayden Kwok-Hay So, Ying Wang, Lei Zhang, Huawei Li, Xiaowei Li. 1129-1140 [doi]
- CORD: Parallelizing Query Processing Across Multiple Computational Storage DevicesWahid Uz Zaman, Cyan Subhra Mishra, Saleh Alsaleh, Abutalib Aghayev, Mahmut Taylan Kandemir. 1141-1153 [doi]
- Matcha: A Language and Compiler for Backtracking-Based Subgraph MatchingYihua Wei, Lihan Hu, Peng Jiang. 1154-1165 [doi]
- FATHOM: Fast Attention Through Optimizing MemoryElliott Binder, Arvind Sudarsanam, Ravi Sunkavalli, Tze Meng Low. 1166-1178 [doi]
- Characterizing the Behavior and Impact of KV Caching on Transformer Inferences Under ConcurrencyJie Ye, Jaime Cernuda, Avinash Maurya, Xian-He Sun, Anthony Kougkas, Bogdan Nicolae. 1191-1202 [doi]
- SymProp: Scaling Sparse Symmetric Tucker Decomposition via Symmetry PropagationZecheng Li 0001, Shruti Shivakumar, Jiajia Li 0001, Ramakrishnan Kannan. 1203-1214 [doi]
- Accelerating Homotopy Continuation with GPUs: Application to Trifocal Pose EstimationChiang-Heng Chien, Ahmad Abdelfattah, Benjamin B. Kimia. 1215-1227 [doi]
- Enhanced JPEG Decoding Using PIM Architectures with Parallel MCU ProcessingJieun Kim, Dukyun Nam. 1228-1237 [doi]
- An Effective Uncorrectable Memory Error Prediction Framework by Exploiting UPH Indicators in Production EnvironmentsXiaobo Zheng, Lisha Qin, Shiyi Li, Wen Xia, Chentao Wu, Yunfei Gu, Qicong Lin, Jun Wan, Huifang Jiao, Rubing Huang. 1238-1248 [doi]
- Fine-Grained Global Search for Inputs Triggering Floating-Point Exceptions in Gpu ProgramsXin Yi, Hengbiao Yu, Liqian Chen, Xiaoguang Mao, Ji Wang 0001, Chun Huang, Deheng Yang. 1249-1260 [doi]
- Inkstream: Instantaneous GNN Inference on Dynamic Graphs via Incremental UpdateDan Wu, Zhaoying Li, Tulika Mitra. 1273-1285 [doi]
- GIFTS: Efficient GCN Inference Framework on PyTorch-CPU via Exploring the SparsityRuiyang Chen, Xing Li, Xiaoyao Liang, Zhuoran Song. 1286-1297 [doi]
- MeanCache: User-Centric Semantic Caching for LLM Web ServicesWaris Gill, Mohamed Elidrisi, Pallavi Kalapatapu, Ammar Ahmed, Ali Anwar 0001, Muhammad Ali Gulzar. 1298-1310 [doi]