Type-safe diff for families of datatypes

Eelco Lempsink, Sean Leather, Andres Löh. Type-safe diff for families of datatypes. In WGP '09: Proceedings of the 2009 ACM SIGPLAN workshop on Generic programming. ACM, New York, NY, USA, 2009. [doi]

Abstract

The UNIX diff program finds the difference between two text files using a classic algorithm for determining the longest common subsequence; however, when working with structured input (e.g. program code), we often want to find the difference between tree-like data (e.g. the abstract syntax tree). In a functional programming language such as Haskell, we can represent this data with a family of (mutually recursive) datatypes. In this paper, we describe a functional, datatype-generic implementation of diff (and the associated program patch). Our approach requires advanced type system features to preserve type safety; therefore, we present the code in Agda, a dependently-typed language well-suited to datatype-generic programming. In order to establish the usefulness of our work, we show that its efficiency can be improved with memoization and that it can also be defined in Haskell.