Collection Processing with Constraints, Monads, and Folds

Ryan Wisnesky. Collection Processing with Constraints, Monads, and Folds. In Florent Bouchez, Sebastian Hack, Eelco Visser, editors, Proceedings of the Workshop on Intermediate Representations. pages 37-44, 2011.

Abstract

We propose an intermediate form based on monad comprehensions (to represent queries) and folds (to represent computation) suitable for use in collection processing. Such an intermediate form captures, in a uniform way, large fragments of many recent large-scale collection processing languages such as MapReduce, PIG, DryadLINQ, and Data Parallel Haskell. Although we are not the first to propose such an intermediate form, we show how to solve four key problems inherent in the naive approach by applying recent work from both programming language theory and relational database theory. First, we show how fold fusion can be extended to exploit the monadic structure of queries. Second, we show how comprehensions themselves can be extended to allow for aggregation. Third, we show how to embed constraints into our intermediate form and how such constraints can be used, for example, to minimize the number of bind operations in a monad comprehension. Finally, we show how to emit proof obligations from our language, so as to ensure that each program is well-formed.