Supporting Language Extension and Separate Compilation by Mixing Java and Bytecode

Lennart C. L. Kats. Supporting Language Extension and Separate Compilation by Mixing Java and Bytecode. Master's thesis, Utrecht University, Utrecht, The Netherlands, August 2007.

Abstract

Language extensions, such as embedded domain-specific languages, are often implemented by assimilating (rewriting) the extended language constructs to the host language. The result can then be compiled by a standard compiler. This approach is limited by the host language, which may not be designed with code generation in mind. An example of this is Java, which provides insufficient protection against name capture of host language identifiers, and does not provide the same low-level primitives that exist in the underlying Java Virtual Machine. For example, it does not provide equivalents for a jump or jump subroutine instruction, unbalanced synchronization, stack manipulation, or specification of debugging information. Code generated from a language that does not match Java’s structure can therefore require inefficient or laborious alternatives instead.

We propose a new open compiler model to provide generators direct access to the underlying compiled code. With conventional open compilers, leveraging the bytecode-generating back-end is an intricate process, requiring adaptations tangled throughout the system. The result is hard to develop, understand, and maintain. By providing a mixed source language of Java and the underlying bytecode instruction language, we can provide access to the back-end at the source-level. Compiled instructions can be used in place of statements or expressions, which can aid language extensions but also applications of separate compilation. For example, it can simplify aspect weavers by enabling direct composition of source code aspects with compiled classes, or vice versa. As such, we also introduce a Java traits compiler that allows operation on classes and traits in both source and compiled form.