Component-based LR parsing

Xiaoqing Wu, Barrett R. Bryant, Jeffrey G. Gray, Marjan Mernik. Component-based LR parsing. Computer Languages, Systems \& Structures, 36(1):16-33, 2010. [doi]


A language implementation with proper compositionality enables a compiler developer to divide-and-conquer the complexity of building a large language by constructing a set of smaller languages. Ideally, these small language implementations should be independent of each other such that they can be designed, implemented and debugged individually, and later be reused in different applications (e.g., building domain-specific languages). However, the language composition offered by several existing parser generators resides at the grammar level, which means all the grammar modules need to be composed together and all corresponding ambiguities have to be resolved before generating a single parser for the language. This produces tight coupling between grammar modules, which harms information hiding and affects independent development of language features. To address this problem, we have developed a novel parsing algorithm that we call Component-based LR (CLR) parsing, which provides code-level compositionality for language development by producing a separate parser for each grammar component. In addition to shift and reduce actions, the algorithm extends general LR parsing by introducing switch and return actions to empower the parsing action to jump from one parser to another. Our experimental evaluation demonstrates that CLR increases the comprehensibility, reusability, changeability and independent development ability of the language implementation. Moreover, the loose coupling among parser components enables CLR to describe grammars that contain LR parsing conflicts or require ambiguous token definitions, such as island grammars and embedded languages.