Abstract is missing.
- Uniform Sparsity in Deep Neural NetworksSaurav Muralidharan. [doi]
- Learning to Parallelize with OpenMP by Augmented Heterogeneous AST RepresentationLe Chen, Quazi Ishtiaque Mahmud, Hung Phan, Nesreen K. Ahmed, Ali Jannesari. [doi]
- μ-TWO: 3× Faster Multi-Model Training with Orchestration and Memory OptimizationSanket Purandare, Abdul Wasay, Animesh Jain, Stratos Idreos. [doi]
- Edge Impulse: An MLOps Platform for Tiny Machine LearningColby R. Banbury, Vijay Janapa Reddi, Alexander Elium, Shawn Hymel, David Tischler, Daniel Situnayake, Carl Ward, Louis Moreau, Jenny Plunkett, Matthew Kelcey, Mathijs Baaijens, Alessandro Grande, Dmitry Maslov, Arthur Beavis, Jan Jongboom, Jessica Quaye. [doi]
- RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid NetworkVitaliy Chiley, Vithursan Thangarasa, Abhay Gupta, Anshul Samar, Joel Hestness, Dennis DeCoste. [doi]
- Virtual Machine Allocation with Lifetime PredictionsHugo Barbalho, Patricia Kovaleski, Beibin Li, Luke Marshall, Marco Molinaro, Abhisek Pan, Eli Cortez, Matheus Leao, Harsh Patwari, Zuzu Tang, Larissa Rozales Gonçalves, David Dion, Thomas Moscibroda, Ishai Menache. [doi]
- On Optimizing the Communication of Model ParallelismYonghao Zhuang 0001, Lianmin Zheng, Zhuohan Li 0001, Eric P. Xing, Qirong Ho, Joseph Gonzalez 0001, Ion Stoica, Hao Zhang, Hexu Zhao. [doi]
- SIRIUS: Harvesting Whole-Program Optimization Opportunities for DNNsYijin Li, Jiacheng Zhao, Qianqi Sun, Haohui Mai, Lei Chen, Wanlu Cao, Yanfan Chen, Zhicheng Li, Ying Liu, Xinyuan Zhang, Xiyu Shi, Jie Zhao, Jingling Xue, Huimin Cui, Xiaobing Feng 0002. [doi]
- PyTorch RPC: Distributed Deep Learning Built on Tensor-Optimized Remote Procedure CallsShen Li, Pritam Damania, Luca Wehrstedt, Rohan Varma, Omkar Salpekar, Pavel Belevich, Howard Huang, Yanli Zhao, Lucas Hosseini, Wanchao Liang, Hongyi Jia, Shihao Xu, Satendra Gera, Alisson G. Azzolini, Guoqiang Jerry Chen, Zachary Devito, Chaoyang He 0001, Amir Ziashahabi, Alban Desmaison, Edward Z. Yang, Gregory Chanan, Brian Vaughan, Manoj Krishnan, Joseph S. Spisak, Salman Avestimehr, Soumith Chintala. [doi]
- Sparsity-Aware Memory Interface Architecture using Stacked XORNet Compression for Accelerating Pruned-DNN ModelsYounghoon Byun, Seungsik Moon, Baeseong Park, Se Jung Kwon, Dongsoo Lee, Gunho Park, Eunji Yoo, Jung Gyu Min, Youngjoo Lee. [doi]
- Hotline Profiler: Automatic Annotation and A Multi-Scale Timeline for Visualizing Time-Use in DNN TrainingDaniel Snider, Fanny Chevalier, Gennady Pekhimenko. [doi]
- Efficiently Scaling Transformer InferenceReiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Bradbury, Jonathan Heek, Kefan Xiao, Shivani Agrawal, Jeff Dean. [doi]
- HyperGef: A Framework Enabling Efficient Fusion for Hypergraph Neural Network on GPUsZhongming Yu, Guohao Dai, Shang Yang, Genghan Zhang, Hengrui Zhang, Feiwen Zhu, June Yang, Jishen Zhao, Yu Wang 0002. [doi]
- FedTree: A Federated Learning System For TreesQinbin Li, Zhaomin Wu, Yanzheng Cai, Yuxuan Han, Ching Man Yung, Tianyuan Fu, Bingsheng He. [doi]
- Tutel: Adaptive Mixture-of-Experts at ScaleChangho Hwang, Wei Cui, Yifan Xiong 0001, Ziyue Yang, Ze Liu, Han Hu 0001, Zilong Wang, Rafael Salas, Jithin Jose, Prabhat Ram, HoYuen Chau, Peng Cheng 0005, Fan Yang 0024, Mao Yang, Yongqiang Xiong. [doi]
- Transcending Runtime-Memory Tradeoffs in Checkpointing by being Fusion AwareHorace He, Shangdi Yu. [doi]
- MegaBlocks: Efficient Sparse Training with Mixture-of-ExpertsTrevor Gale, Deepak Narayanan, Cliff Young, Matei Zaharia. [doi]
- Cuttlefish: Low-Rank Model Training without All the TuningHongyi Wang 0001, Saurabh Agarwal, Pongsakorn U.-Chupala, Yoshiki Tanaka, Eric P. Xing, Dimitris Papailiopoulos. [doi]
- Be Careful with PyPI Packages: You May Unconsciously Spread Backdoor Model WeightsTianhang Zheng, Hao Lan, Baochun Li. [doi]
- Breadth-First Pipeline ParallelismJoel Lamy-Poirier. [doi]
- XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for the MetaverseHyoukjun Kwon, Krishnakumar Nair, Jamin Seo, Jason Yik, Debabrata Mohapatra, Dongyuan Zhan, Jinook Song, Peter Capak, Peizhao Zhang, Peter Vajda, Colby R. Banbury, Mark Mazumder, Liangzhen Lai, Ashish Sirasao, Tushar Krishna, Harshit Khaitan, Vikas Chandra, Vijay Janapa Reddi. [doi]
- Subgraph Stationary Hardware-Software Inference Co-DesignPayman Behnam, Alexey Tumanov, Tushar Krishna, Pranav Gadikar, Yangyu Chen, Jianming Tong, Yue Pan, Abhimanyu Rajeshkumar Bambhaniya, Alind Khare. [doi]
- Safe Optimized Static Memory Allocation for Parallel Deep LearningIoannis Lamprou 0001, Zhen Zhang, Javier de Juan, Hang Yang, Yongqiang Lai, Etienne Filhol, Cédric Bastoul. [doi]
- Cupcake: A Compression Scheduler for Scalable Communication-Efficient Distributed TrainingZhuang Wang, Xinyu Crystal Wu, Zhaozhuo Xu, T. S. Eugene Ng. [doi]
- Communication-Efficient Graph Neural Networks with Probabilistic Neighborhood Expansion Analysis and CachingTim Kaler, Alexandros-Stavros Iliopoulos, Philip Murzynowski, Tao B. Schardl, Charles E. Leiserson, Jie Chen 0007. [doi]
- RecD: Deduplication for End-to-End Deep Learning Recommendation Model Training InfrastructureMark Zhao, Dhruv Choudhary, Devashish Tyagi, Ajay Somani, Max Kaplan, Sung-Han Lin, Sarunya Pumma, JongSoo Park, Aarti Basant, Niket Agarwal, Carole-Jean Wu, Christos Kozyrakis. [doi]
- ALCOP: Automatic Load-Compute Pipelining in Deep Learning Compiler for AI-GPUsGuyue Huang, Yang Bai, Liu Liu 0017, Yuke Wang, Bei Yu 0001, Yufei Ding, Yuan Xie 0001. [doi]
- FLINT: A Platform for Federated Learning IntegrationEwen Wang, Boyi Chen, Mosharaf Chowdhury, Ajay Kannan, Franco Liang. [doi]
- Renee: End-To-End Training of Extreme Classification ModelsVidit Jain, Jatin Prakash, Deepak Saini, Jian Jiao 0007, Ramachandran Ramjee, Manik Varma. [doi]
- Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost ModelsDaochen Zha, Louis Feng, Liang Luo, Bhargav Bhushanam, Zirui Liu, Yusuo Hu, Jade Nie, Yuzhen Huang, Yuandong Tian, Arun Kejariwal, Xia Hu 0001. [doi]
- Unified Convolution Framework: A compiler-based approach to support sparse convolutionsJaeyeon Won, Changwan Hong, Charith Mendis, Joel S. Emer, Saman P. Amarasinghe. [doi]
- SysNoise: Exploring and Benchmarking Training-Deployment System InconsistencyYan Wang, Yuhang Li, Ruihao Gong, Aishan Liu, Yanfei Wang, Jian Hu, Yongqiang Yao, Yunchen Zhang, Tianzi Xiao, Fengwei Yu, Xianglong Liu 0001. [doi]
- AutoScratch: ML-Optimized Cache Management for Inference-Oriented GPUsYaosheng Fu, Evgeny Bolotin, Aamer Jaleel, Gal Dalal, Shie Mannor, Jacob Subag, Noam Korem, Michael Behar, David W. Nellans. [doi]
- Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN TrainingBorui Wan, Juntao Zhao, Chuan Wu 0001. [doi]
- Practical Edge Kernels for Integer-Only Vision Transformers Under Post-training QuantizationZining Zhang 0001, Bingsheng He, Zhenjie Zhang. [doi]
- PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information MatricesKazuki Osawa, Shigang Li 0002, Torsten Hoefler. [doi]
- Validating Large Language Models with ReLMMichael Kuchnik, Virginia Smith, George Amvrosiadis. [doi]
- Building Verified Neural Networks for Computer Systems with OuroborosCheng Tan, Changliu Liu, Zhihao Jia, Tianhao Wei. [doi]
- ApproxCaliper: A Programmable Framework for Application-aware Neural Network OptimizationYifan Zhao, Hashim Sharif, Peter Pao-Huang, Vatsin Shah, Arun Narenthiran Sivakumar, Mateus Valverde Gasparino, Abdulrahman Mahmoud, Nathan Zhao, Sarita V. Adve, Girish Chowdhary 0001, Sasa Misailovic, Vikram S. Adve. [doi]
- GlueFL: Reconciling Client Sampling and Model Masking for Bandwidth Efficient Federated LearningShiqi He, Qifan Yan, Feijie Wu, Lanjun Wang, Mathias Lécuyer, Ivan Beschastnikh. [doi]
- Efficient GPU Kernels for N: M-Sparse Weights in Deep LearningBin Lin, Ningxin Zheng, Lei Wang, Shijie Cao, Lingxiao Ma, Quanlu Zhang, Yi Zhu, Ting Cao, Jilong Xue, Yuqing Yang 0001, Fan Yang 0024. [doi]
- X-RLflow: Graph Reinforcement Learning for Neural Network Subgraphs TransformationGuoliang He, Sean Parker, Eiko Yoneki. [doi]
- Exploiting Hardware Utilization and Adaptive Dataflow for Efficient Sparse Convolution in 3D Point CloudsKe Hong, Zhongming Yu, Guohao Dai, Xinhao Yang, Yaoxiu Lian, Zehao Liu, Ningyi Xu, Yuhan Dong, Yu Wang 0002. [doi]
- Reducing Activation Recomputation in Large Transformer ModelsVijay Anand Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, Bryan Catanzaro. [doi]
- GiPH: Generalizable Placement Learning for Adaptive Heterogeneous ComputingYi Hu, Chaoran Zhang, Edward Andert, Harshul Singh, Aviral Shrivastava, James Laudon, Yanqi Zhou, Bob Iannucci, Carlee Joe-Wong. [doi]
- On Noisy Evaluation in Federated Hyperparameter TuningKevin Kuo, Pratiksha Thaker, Mikhail Khodak, John Nguyen, Daniel Jiang, Ameet Talwalkar, Virginia Smith. [doi]