codeQuest: Scalable Source Code Queries with Datalog

Elnar Hajiyev, Mathieu Verbaere, Oege de Moor. codeQuest: Scalable Source Code Queries with Datalog. In Dave Thomas, editor, ECOOP 2006 - Object-Oriented Programming, 20th European Conference, Nantes, France, July 3-7, 2006, Proceedings. Volume 4067 of Lecture Notes in Computer Science, pages 2-27, Springer, 2006. [doi]

Abstract

Source code querying tools allow programmers to explore relations between different parts of the code base. This paper describes such a tool, named codeQuest. It combines two previous proposals, namely the use of logic programming and database systems. As the query language we use safe Datalog, which was originally introduced in the theory of databases. That provides just the right level of expressiveness; in particular recursion is indispensable for source code queries. Safe Datalog is like Prolog, but all queries are guaranteed to terminate, and there is no need for extra-logical annotations. Our implementation of Datalog maps queries to a relational database system. We are thus able to capitalise on the query optimiser provided by such a system. For recursive queries we implement our own optimisations in the translation from Datalog to SQL. Experiments confirm that this strategy yields an efficient, scalable code querying system.