Abstract is missing.
- Swift for TensorFlow: A portable, flexible platform for deep learningBrennan Saeta, Denys Shabalin. [doi]
- CODE: Compiler-based Neuron-aware Ensemble trainingEttore M. G. Trainiti, Thanapon Noraset, David Demeter, Doug Downey, Simone Campanoni. [doi]
- Cortex: A Compiler for Recursive Deep Learning ModelsPratik Fegade, TianQi Chen, Phillip B. Gibbons, Todd C. Mowry. [doi]
- Characterizing and Taming Model Instability Across Edge DevicesEyal Cidon, Evgenya Pergament, Zain Asgar, Asaf Cidon, Sachin Katti. [doi]
- Equality Saturation for Tensor Graph SuperoptimizationYichen Yang 0008, Phitchaya Phothilimthana, Yisu Remy Wang, Max Willsey, Sudip Roy 0002, Jacques Pienaar. [doi]
- TT-Rec: Tensor Train Compression for Deep Learning Recommendation ModelsChunxing Yin, Bilge Acun, Carole-Jean Wu, Xing Liu. [doi]
- Exploring the Limits of Concurrency in ML Training on Google TPUSSameer Kumar, Yu Emma Wang, Cliff Young, James Bradbury, Naveen Kumar, Dehao Chen, Andy Swing. [doi]
- Scaling Distributed Training with Adaptive SummationSaeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi, Tianju Xu, Vadim Eksarevskiy, Jaliya Ekanayake, Emad Barsoum. [doi]
- Amazon SageMaker Debugger: A System for Real-Time Insights into Machine Learning Model TrainingNathalie Rauschmayr, Vikas Kumar, Rahul Huilgol, Andrea Olgiati, Satadal Bhattacharjee, Nihal Harish, Vandana Kannan, Amol Lele, Anirudh Acharya, Jared Nielsen, Lakshmi Ramakrishnan, Ishan Bhatt, Kohen Chia, Neelesh Dodda, Zhihan Li, Jiacheng Gu, Miyoung Choi, Balajee Nagarajan, Jeffrey Geevarghese, Denis Davydenko, Sifei Li, Lu Huang, Edward Kim, Tyler Hill, Krishnaram Kenthapadi. [doi]
- An Efficient Statistical-based Gradient Compression Technique for Distributed Training SystemsAhmed M. Abdelmoniem, Ahmed Elzanaty, Mohamed-Slim Alouini, Marco Canini. [doi]
- MicroRec: Efficient Recommendation Inference by Hardware and Data Structure SolutionsWenqi Jiang, Zhenhao He, Shuai Zhang, Thomas B. Preußer, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, Ce Zhang 0001, Gustavo Alonso. [doi]
- Don't Forget to Sign the Gradients!Omid Aramoon, Pin-Yu Chen, Gang Qu 0001. [doi]
- Fluid: Resource-aware Hyperparameter Tuning EnginePeifeng Yu, Jiachen Liu, Mosharaf Chowdhury. [doi]
- Scaling Polyhedral Neural Network Verification on GPUsChristoph Müller, François Serre, Gagandeep Singh 0001, Markus Püschel, Martin T. Vechev. [doi]
- Wavelet: Efficient DNN Training with Tick-Tock SchedulingGuanhua Wang, Kehan Wang, Kenan Jiang, Xiangjun Li, Ion Stoica. [doi]
- Learning on Distributed Traces for Data Center Storage SystemsGiulio Zhou, Martin Maas 0001. [doi]
- In-network Aggregation for Shared Machine Learning ClustersNadeen Gebara, Manya Ghobadi, Paolo Costa. [doi]
- ModularNAS: Towards Modularized and Reusable Neural Architecture SearchYunfeng Lin, Guilin Li, Xing Zhang, Weinan Zhang 0001, Bo Chen, Ruiming Tang, Zhenguo Li, Jiashi Feng, Yong Yu 0001. [doi]
- Adaptive Gradient Communication via Critical Learning Regime IdentificationSaurabh Agarwal, Hongyi Wang, Kangwook Lee 0001, Shivaram Venkataraman, Dimitris S. Papailiopoulos. [doi]
- Doping: A technique for Extreme Compression of LSTM Models using Sparse Structured Additive MatricesUrmish Thakker, Paul N. Whatmough, Zhi-gang Liu, Matthew Mattina, Jesse G. Beu. [doi]
- sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live DataGuanhua Wang, Zhuang Liu 0003, Brandon Hsieh, Siyuan Zhuang, Joseph Gonzalez 0001, Trevor Darrell, Ion Stoica. [doi]
- ByzShield: An Efficient and Robust System for Distributed TrainingKonstantinos Konstantinidis, Aditya Ramamoorthy. [doi]
- IOS: Inter-Operator Scheduler for CNN AccelerationYaoyao Ding, Ligeng Zhu, Zhihao Jia, Gennady Pekhimenko, Song Han 0003. [doi]
- Larq Compute Engine: Design, Benchmark and Deploy State-of-the-Art Binarized Neural NetworksTom Bannink, Adam Hillier, Lukas Geiger, Tim de Bruin, Leon Overweel, Jelmer Neeven, Koen Helwegen. [doi]
- Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning ModelsShang Wang 0002, Peiming Yang, Yuxuan Zheng, Xin Li, Gennady Pekhimenko. [doi]
- Value Learning for Throughput Optimization of Deep Learning WorkloadsBenoit Steiner, Chris Cummins, Horace He, Hugh Leather. [doi]
- Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and MoreShabnam Daghaghi, Nicholas Meisburger, Mengnan Zhao, Anshumali Shrivastava. [doi]
- Accounting for Variance in Machine Learning BenchmarksXavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Nazanin Mohammadi Sepahvand, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Tal Arbel, Chris Pal, Gaël Varoquaux, Pascal Vincent. [doi]
- Lost in Pruning: The Effects of Pruning Neural Networks beyond Test AccuracyLucas Liebenwein, Cenk Baykal, Brandon Carter 0001, David Gifford 0001, Daniela Rus. [doi]
- Rethinking Floating Point Overheads for Mixed Precision DNN AcceleratorsHamzah Abdel-Aziz, Ali Shafiee, Jong Hoon Shin, Ardavan Pedram, Joseph Hassoun. [doi]
- Accelerate Inference of CNNs for Video Analysis While Preserving Exactness Exploiting Activation SparsityToshiaki Wakatsuki, Sekitoshi Kanai, Yasuhiro Fujiwara. [doi]
- FLAML: A Fast and Lightweight AutoML LibraryChi Wang 0001, Qingyun Wu, Markus Weimer, Erkang Zhu. [doi]
- VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network InferenceSteve Dai, Rangharajan Venkatesan, Mark Ren, Brian Zimmer, William J. Dally, Brucek Khailany. [doi]
- To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural NetworksXiaohu Tang, Shihao Han, Li Lyna Zhang, Ting Cao, Yunxin Liu. [doi]
- TensorFlow Lite Micro: Embedded Machine Learning for TinyML SystemsRobert David, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, Ian Nappier, Meghna Natraj, Tiezhen Wang, Pete Warden, Rocky Rhodes. [doi]
- Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial RecoveryKiwan Maeng, Shivam Bharuka, Isabel Gao, Mark C. Jeffrey, Vikram Saraph, Bor-Yiing Su, Caroline Trippel, Jiyan Yang, Mike Rabbat, Brandon Lucia, Carole-Jean Wu. [doi]
- PipeMare: Asynchronous Pipeline Parallel DNN TrainingBowen Yang, Jian Zhang 0049, Jonathan Li 0002, Christopher Ré, Christopher R. Aberger, Christopher De Sa. [doi]
- Bit Error Robustness for Energy-Efficient DNN AcceleratorsDavid Stutz, Nandhini Chandramoorthy, Matthias Hein 0001, Bernt Schiele. [doi]
- A Distributed Graph-Theoretic Framework for Automatic Parallelization in Multi-core SystemsGuixiang Ma, Yao Xiao, Theodore L. Willke, Nesreen K. Ahmed, Shahin Nazarian, Paul Bogdan. [doi]
- Boveda: Building an On-Chip Deep Learning Memory Hierarchy Brick by BrickIsak Edo Vivancos, Sayeh Sharify, Daniel Ly-Ma, Ameer Abdelhadi, Ciaran Bannon, Milos Nikolic, Mostafa Mahmoud, Alberto Delmas Lascorz, Gennady Pekhimenko, Andreas Moshovos. [doi]
- Nimble: Efficiently Compiling Dynamic Neural Networks for Model InferenceHaichen Shen, Jared Roesch, Zhi Chen, Wei Chen, Yong Wu, Mu Li 0003, Vin Sharma, Zachary Tatlock, Yida Wang. [doi]
- Pufferfish: Communication-efficient Models At No Extra CostHongyi Wang, Saurabh Agarwal, Dimitris S. Papailiopoulos. [doi]
- RL-Scope: Cross-stack Profiling for Deep Reinforcement Learning WorkloadsJames Gleeson 0001, Moshe Gabel, Gennady Pekhimenko, Eyal de Lara, Srivatsan Krishnan, Vijay Janapa Reddi. [doi]
- FirePlace: Placing Firecraker Virtual Machines with Hindsight ImitationBharathan Balaji, Christopher Kakovitch, Balakrishnan Narayanaswamy. [doi]
- MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity MicrocontrollersColby R. Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas Navarro, Urmish Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, Paul N. Whatmough. [doi]
- A Learned Performance Model for Tensor Processing UnitsSamuel J. Kaufman, Phitchaya Mangpo Phothilimthana, Yanqi Zhou, Charith Mendis, Sudip Roy 0002, Amit Sabne, Mike Burrows. [doi]
- SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier DetectionYue Zhao 0016, Xiyang Hu, Cheng Cheng 0001, Cong Wang, Changlin Wan, Wen Wang, Jianing Yang, Haoping Bai, Zheng Li, Cao Xiao, Yunlong Wang, Zhi Qiao 0007, Jimeng Sun, Leman Akoglu. [doi]
- Pipelined Backpropagation at Scale: Training Large Models without BatchesAtli Kosson, Vitaliy Chiley, Abhinav Venigalla, Joel Hestness, Urs Köster. [doi]
- Towards Scalable Distributed Training of Deep Learning on Public Cloud ClustersShaohuai Shi, Xianhao Zhou, Shutao Song, Xingyao Wang, Zilin Zhu, Xue Huang, Xinan Jiang, Feihu Zhou, Zhenyu Guo, Liqiang Xie, Rui Lan, Xianbin Ouyang, Yan Zhang, Jieqian Wei, Jing Gong, Weiliang Lin, Ping Gao, Peng Meng, Xiaomin Xu, Chenyang Guo, Bo Yang, Zhibo Chen 0006, Yongjian Wu, Xiaowen Chu 0001. [doi]
- Data Movement Is All You Need: A Case Study on Optimizing TransformersAndrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li 0002, Torsten Hoefler. [doi]
- A Deep Learning Based Cost Model for Automatic Code OptimizationRiyadh Baghdadi, Massinissa Merouani, Mohamed-Hicham Leghettas, Kamel Abdous, Taha Arbaoui, Karima Benatchba, Saman P. Amarasinghe. [doi]
- Learning Fitness Functions for Machine ProgrammingShantanu Mandal, Todd A. Anderson, Javier Turek, Justin Gottschlich, Shengtian Zhou, Abdullah Muzahid. [doi]