Abstract is missing.
- URSABench: A System for Comprehensive Benchmarking of Bayesian Deep Neural Network Models and Inference methodsMeet P. Vadera, Jinyang Li 0004, Adam D. Cobb, Brian Jalaian, Tarek F. Abdelzaher, Benjamin M. Marlin. [doi]
- Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained ProcessorsShurui Li, Puneet Gupta. [doi]
- Sequential Aggregation and Rematerialization: Distributed Full-batch Training of Graph Neural Networks on Large GraphsHesham Mostafa. [doi]
- GPU Semiring Primitives for Sparse Neighborhood MethodsCorey J. Nolet, Divye Gala, Edward Raff, Joe Eaton, Brad Rees, Tim Oates. [doi]
- HALOS: Hashing Large Output Space for Cheap InferenceZichang Liu, Zhaozhuo Xu, Alan Baonan Ji, Junyan Zhang, Jonathan Li 0002, Beidi Chen, Anshumali Shrivastava. [doi]
- LightSecAgg: a Lightweight and Versatile Design for Secure Aggregation in Federated LearningJinhyun So, Corey J. Nolet, Chien-Sheng Yang, Songze Li, Qian Yu 0001, Ramy E. Ali, Basak Guler, Salman Avestimehr. [doi]
- torch.fx: Practical Program Capture and Transformation for Deep Learning in PythonJames K. Reed, Zachary Devito, Horace He, Ansley Ussery, Jason Ansel. [doi]
- Learning Compressed Embeddings for On-Device InferenceNiketan Pansare, Jay Katukuri, Aditya Arora, Frank Cipollone, Riyaaz Shaik, Noyan Tokgozoglu, Chandru Venkataraman. [doi]
- Collapsible Linear Blocks for Super-Efficient Super ResolutionKartikeya Bhardwaj, Milos Milosavljevic, Liam O'Neil, Dibakar Gope, Ramon Matas Navarro, Alex Chalfin, Naveen Suda, Lingchuan Meng, Danny Loh. [doi]
- Apollo: Automatic Partition-based Operator Fusion through Layer by Layer OptimizationJie Zhao 0002, Xiong Gao, Ruijie Xia, Zhaochuang Zhang, Deshi Chen, Lei Chen, Renwei Zhang, Zhen Geng, Bin Cheng, Xuefeng Jin. [doi]
- The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal PaddingPratik Fegade, TianQi Chen, Phillip B. Gibbons, Todd C. Mowry. [doi]
- AccMPEG: Optimizing Video Encoding for Accurate Video AnalyticsKuntai Du, Qizheng Zhang, Anton Arapin, Haodong Wang, Zhengxu Xia, Junchen Jiang. [doi]
- PAPAYA: Practical, Private, and Scalable Federated LearningDzmitry Huba, John Nguyen, Kshitiz Malik, Ruiyu Zhu, Mike Rabbat, Ashkan Yousefpour, Carole-Jean Wu, Hongyuan Zhan, Pavel Ustinov, Harish Srinivas, Kaikai Wang, Anthony Shoumikhin, Jesik Min, Mani Malek 0001. [doi]
- Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep LearningNingning Xie, Tamara Norman, Dominik Grewe, Dimitrios Vytiniotis. [doi]
- MLPerf Mobile Inference Benchmark: An Industry-Standard Open-Source Machine Learning Benchmark for On-Device AIVijay Janapa Reddi, David Kanter, Peter Mattson, Jared Duke, Thai Nguyen, Ramesh Chukka, Kenneth Shiring, Koan-Sin Tan, Mark Charlebois, William Chou, Mostafa El-Khamy, Jungwook Hong, Tom St. John, Cindy Trinh, Michael Buch, Mark Mazumder, Relja Markovic, Thomas Atta-Fosu, Fatih Çakir, Masoud Charkhabi, Xiaodong Chen, Cheng-Ming Chiang, Dave Dexter, Terry Heo, Guenther Schmuelling, Maryam Shabani, Dylan Zika. [doi]
- Towards the Co-design of Neural Networks and AcceleratorsYanqi Zhou, Xuanyi Dong, Tianjian Meng, Mingxing Tan, Berkin Akin, Daiyi Peng, Amir Yazdanbakhsh, Da Huang, Ravi Narayanaswami, James Laudon. [doi]
- mmSampler: Efficient Frame Sampler for Multimodal Video RetrievalZhiming Hu, Angela Ning Ye, Iqbal Mohomed. [doi]
- Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and PipeliningTim Kaler, Nickolas Stathas, Anne Ouyang, Alexandros-Stavros Iliopoulos, Tao B. Schardl, Charles E. Leiserson, Jie Chen 0007. [doi]
- ML-EXray: Visibility into ML Deployment on the EdgeHang Qiu, Ioanna Vavelidou, Jian Li, Evgenya Pergament, Pete Warden, Sandeep Chinchali, Zain Asgar, Sachin Katti. [doi]
- SLA-Driven ML Inference Framework for Clouds with Hetergeneous AcceleratorsJunguk Cho, Diman Zad Tootaghaj, Lianjie Cao, Puneet Sharma. [doi]
- Sustainable AI: Environmental Implications, Challenges and OpportunitiesCarole-Jean Wu, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha Ardalani, Kiwan Maeng, Gloria Chang, Fiona Aga Behram, Jinshi Huang, Charles Bai, Michael Gschwind, Anurag Gupta, Myle Ott, Anastasia Melnikov, Salvatore Candido, David Brooks 0001, Geeta Chauhan, Benjamin Lee, Hsien-Hsin S. Lee, Bugra Akyildiz, Maximilian Balandat, Joe Spisak, Ravi Jain, Mike Rabbat, Kim Hazelwood. [doi]
- Understanding GNN Computational Graph: A Coordinated Computation, IO, and Memory PerspectiveHengrui Zhang, Zhongming Yu, Guohao Dai, Guyue Huang, Yufei Ding, Yuan Xie 0001, Yu Wang 0002. [doi]
- Bolt: Bridging the Gap between Auto-tuners and Hardware-native PerformanceJiarong Xing, Leyuan Wang, Shang Zhang, Jack Chen, Ang Chen, Yibo Zhu. [doi]
- A Transferable Approach for Partitioning Machine Learning Models on Multi-Chip-ModulesXinfeng Xie, Prakash Prabhu, Ulysse Beaugnon, Phitchaya Mangpo Phothilimthana, Sudip Roy 0002, Azalia Mirhoseini, Eugene Brevdo, James Laudon, Yanqi Zhou. [doi]
- FROTE: Feedback Rule-Driven Oversampling for Editing ModelsOznur Alkan, Dennis Wei, Massimiliano Mattetti, Rahul Nair, Elizabeth Daly, Diptikalyan Saha. [doi]
- Improving Model Training with Multi-fidelity Hyperparameter EvaluationYimin Huang, Yujun Li, Hanrong Ye, Zhenguo Li, Zhihua Zhang. [doi]
- dPRO: A Generic Performance Diagnosis and Optimization Toolkit for Expediting Distributed DNN TrainingHanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu 0001, Yibo Zhu, Haibin Lin, Chuanxiong Guo. [doi]
- BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node SamplingCheng Wan, Youjie Li, Ang Li, Nam Sung Kim, Yingyan Lin. [doi]
- On the Utility of Gradient Compression in Distributed Training SystemsSaurabh Agarwal, Hongyi Wang, Shivaram Venkataraman, Dimitris S. Papailiopoulos. [doi]
- DietCode: Automatic Optimization for Dynamic Tensor ProgramsBojian Zheng, Ziheng Jiang, Cody Hao Yu, Haichen Shen, Joshua Fromm, Yizhi Liu, Yida Wang, Luis Ceze, TianQi Chen, Gennady Pekhimenko. [doi]
- QuClassi: A Hybrid Deep Neural Network Architecture based on Quantum State FidelitySamuel A. Stein, Betis Baheri, Daniel Chen, Ying Mao, Qiang Guan, Ang Li, Shuai Xu, Caiwen Ding. [doi]
- A Tale of Two Models: Constructing Evasive Attacks on Edge ModelsWei Hao, Aahil Awatramani, Jiayang Hu, Chengzhi Mao, Pin-Chun Chen, Eyal Cidon, Asaf Cidon, Junfeng Yang. [doi]
- Gyro Dropout: Maximizing Ensemble Effect in Neural Network TrainingJiwon Seo. [doi]
- TorchSparse: Efficient Point Cloud Inference EngineHaotian Tang, Zhijian Liu, Xiuyu Li, Yujun Lin 0001, Song Han 0003. [doi]
- VirtualFlow: Decoupling Deep Learning Models from the Underlying HardwareAndrew Or, Haoyu Zhang, Michael None Freedman. [doi]
- TAGLETS: A System for Automatic Semi-Supervised Learning with Auxiliary DataWasu Piriyakulkij, Cristina Menghini, Ross Briden, Nihal V. Nayak, Jeffrey Zhu, Elaheh Raisi, Stephen H. Bach. [doi]
- NURD: Negative-Unlabeled Learning for Online Datacenter Straggler PredictionYi Ding 0006, Avinash Rao, Hyebin Song, Rebecca Willett, Henry Hoffmann. [doi]
- SRIFTY: Swift and Thrifty Distributed Neural Network Training on the CloudLiang Luo, Peter West, Pratyush Patel, Arvind Krishnamurthy, Luis Ceze. [doi]
- Revelio: ML-Generated Debugging Queries for Finding Root Causes in Distributed SystemsPradeep Dogga, Karthik Narasimhan, Anirudh Sivaraman, Shiv Kumar Saini, George Varghese, Ravi Netravali. [doi]
- QuadraLib: A Performant Quadratic Neural Network Library for Architecture Optimization and Design ExplorationZirui Xu, Fuxun Yu, Jinjun Xiong, Xiang Chen 0010. [doi]
- ULPPACK: Fast Sub-8-bit Matrix Multiply on Commodity SIMD HardwareJaeyeon Won, Jeyeon Si, Sam Son, Tae Jun Ham, Jae W. Lee. [doi]
- Graphiler: Optimizing Graph Neural Networks with Message Passing Data Flow GraphZhiqiang Xie, Minjie Wang, Zihao Ye 0001, Zheng Zhang 0001, Rui Fan. [doi]
- Matchmaker: Data Drift Mitigation in Machine Learning for Large-Scale SystemsAnkur Mallick, Kevin Hsieh, Behnaz Arzani, Gauri Joshi. [doi]
- Randomness in Neural Network Training: Characterizing the Impact of ToolingDonglin Zhuang, Xingyao Zhang, Shuaiwen Song, Sara Hooker. [doi]
- Pathways: Asynchronous Distributed Dataflow for MLPaul Barham 0001, Aakanksha Chowdhery, Jeff Dean, Sanjay Ghemawat, Steven Hand, Dan Hurt, Michael Isard, Hyeontaek Lim, Ruoming Pang, Sudip Roy 0002, Brennan Saeta, Parker Schuh, Ryan Sepassi, Laurent El Shafey, Chandramohan A. Thekkath, Yonghui Wu. [doi]
- Efficient Strong Scaling Through Burst Parallel TrainingSeo Jin Park, Joshua Fried, Sunghyun Kim, Mohammad Alizadeh, Adam Belay. [doi]
- Random Offset Block Embedding (ROBE) for compressed embedding tables in deep learning recommendation systemsAditya Desai, Li Chou, Anshumali Shrivastava. [doi]
- REX: Revisiting Budgeted Training with an Improved ScheduleJohn Chen, Cameron R. Wolfe, Tasos Kyrillidis. [doi]
- Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data PipelinesMichael Kuchnik, Ana Klimovic, Jiri Simsa, Virginia Smith, George Amvrosiadis. [doi]
- TyXe: Pyro-based Bayesian neural nets for PytorchHippolyt Ritter, Theofanis Karaletsos. [doi]
- Hydrozoa: Dynamic Hybrid-Parallel DNN Training on Serverless ContainersRunsheng Guo 0003, Victor Guo, Antonio Kim, Josh Hildred, Khuzaima Daudjee. [doi]