Efficient local type inference

Ben Bellamy, Pavel Avgustinov, Oege de Moor, Damien Sereni. Efficient local type inference. In Gail E. Harris, editor, Proceedings of the 23rd Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2008, October 19-23, 2008, Nashville, TN, USA. pages 475-492, ACM, 2008. [doi]

Abstract

Inference of static types for local variables in Java bytecode is the first step of any serious tool that manipulates bytecode, be it for decompilation, transformation or analysis. It is important, therefore, to perform that step as accurately and efficiently as possible. Previous work has sought to give solutions with good worst-case complexity.

We present a novel algorithm, which is optimised for the common case rather than worst-case performance. It works by first finding a set of minimal typings that are valid for all assignments, and then checking whether these minimal typings satisfy all uses. Unlike previous algorithms, it does not explicitly build a data structure of type constraints, and it is easy to implement efficiently. We prove that the algorithm produces a typing that is both sound (obeying the rules of the language) and as tight as possible.

We then go on to present extensive experiments, comparing the results of the new algorithm against the previously best known method. The experiments include bytecode that is generated in other ways than compilation of Java source. The new algorithm is always faster, typically by a factor 6, but on some real benchmarks the gain is as high as a factor of 92. Furthermore, whereas that previous method is sometimes suboptimal, our algorithm always returns a tightest possible type.

We also discuss in detail how we handle primitive types, which is a difficult issue due to the discrepancy in their treatment between Java bytecode and Java source. For the application to decompilation, however, it is very important to handle this correctly.