DRAGON: A Dynamic Scheduling and Scaling Controller for Managing Distributed Deep Learning Jobs in Kubernetes Cluster

Chan-Yi Lin, Ting-An Yeh, Jerry Chou. DRAGON: A Dynamic Scheduling and Scaling Controller for Managing Distributed Deep Learning Jobs in Kubernetes Cluster. In Víctor Méndez Muñoz, Donald Ferguson, Markus Helfert, Claus Pahl, editors, Proceedings of the 9th International Conference on Cloud Computing and Services Science, CLOSER 2019, Heraklion, Crete, Greece, May 2-4, 2019. pages 569-577, SciTePress, 2019. [doi]

Abstract

Abstract is missing.