Large-scale, AST-based API-usage analysis of open-source Java projects

Ralf Lämmel, Ekaterina Pek, Jürgen Starek. Large-scale, AST-based API-usage analysis of open-source Java projects. In SAC '11: Proceedings of the 26th ACM Symposium on Applied Computing. 2011.

Abstract

Research on API migration and language conversion relies on empirical data about API usage for the benefit of validating mapping rules for API migration in terms of prioritization, applicability, generality, and correctness. We describe an approach to large-scale API-usage analysis of open-source Java projects, which we also instantiate for the SourceForge open-source repository in a certain way. Our approach covers checkout, building, tagging with metadata, fact extraction, analysis, and synthesis with a large degree of automation. Fact extraction relies on resolved (type-checked) ASTs. We describe a few specific forms of API-usage analysis; they are motivated by API migration. These forms are concerned with analysing API footprint (such as the number of distinct APIs used in a project), API coverage (such as the percentage of methods of an API used in a corpus), and framework-like vs. class-library-like usage.