On Inter-Dataset Code Duplication and Data Leakage in Large Language Models

José Antonio Hernández López, Boqi Chen, Mootez Saad, Tushar Sharma 0001, Dániel Varró. On Inter-Dataset Code Duplication and Data Leakage in Large Language Models. IEEE Trans. Software Eng., 51(1):192-205, January 2025. [doi]

Authors

José Antonio Hernández López

This author has not been identified. Look up 'José Antonio Hernández López' in Google

Boqi Chen

This author has not been identified. Look up 'Boqi Chen' in Google

Mootez Saad

This author has not been identified. Look up 'Mootez Saad' in Google

Tushar Sharma 0001

This author has not been identified. Look up 'Tushar Sharma 0001' in Google

Dániel Varró

This author has not been identified. It may be one of the following persons: Look up 'Dániel Varró' in Google