On Inter-Dataset Code Duplication and Data Leakage in Large Language Models

José Antonio Hernández López, Boqi Chen, Mootez Saad, Tushar Sharma 0001, Dániel Varró. On Inter-Dataset Code Duplication and Data Leakage in Large Language Models. IEEE Trans. Software Eng., 51(1):192-205, January 2025. [doi]

Abstract

Abstract is missing.