The gLite workload management system

P Andreetto, S Andreozzi, G Avellino, S Beco, A Cavallini, M Cecchi, V Ciaschini, A Dorise, F Giacomini, A Gianelle, U Grandinetti, A Guarise, A Krop, R Lops, A Maraschini, V Martelli, M Marzolla, M Mezzadri, E Molinari, S Monforte, F Pacini, M Pappalardo, A Parrini, G Patania, L Petronzio, R Piro, M Porciani, F Prelz, D Rebatto, Elisabetta Ronchieri, M Sgaravatto, V Venturi, L Zangrando. The gLite workload management system. Journal of Physics: Conference Series, 119(6):62007-62007, 2008. [doi]

Abstract

The gLite Workload Management System (WMS) is a collection of components that provide the service responsible for distributing and managing tasks across computing and storage resources available on a Grid. The WMS basically receives requests of job execution from a client, finds the required appropriate resources, then dispatches and follows the jobs until completion, handling failure whenever possible. Other than single batch-like jobs, compound job types handled by the WMS are Directed Acyclic Graphs (a set of jobs where the input/output/execution of one of more jobs may depend on one or more other jobs), Parametric Jobs (multiple jobs with one parametrized description), and Collections (multiple jobs with a common description). Jobs are described via a flexible, high-level Job Definition Language (JDL). New functionality was recently added to the system (use of Service Discovery for obtaining new service endpoints to be contacted, automatic sandbox files archival/compression and sharing, support for bulk-submission and bulk-matchmaking). Intensive testing and troubleshooting allowed to dramatically increase both job submission rate and service stability. Future developments of the gLite WMS will be focused on reducing external software dependency, improving portability, robustness and usability.