The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation

Dung Nguyen Manh, Le Nam Hai, Anh T. V. Dau, Anh-Minh Nguyen, Khanh Nghiem, Jin Guo, Nghi D. Q. Bui. The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation. In Houda Bouamor, Juan Pino 0001, Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023. pages 4763-4788, Association for Computational Linguistics, 2023. [doi]

@inproceedings{ManhHDNNGB23,
  title = {The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation},
  author = {Dung Nguyen Manh and Le Nam Hai and Anh T. V. Dau and Anh-Minh Nguyen and Khanh Nghiem and Jin Guo and Nghi D. Q. Bui},
  year = {2023},
  url = {https://aclanthology.org/2023.findings-emnlp.316},
  researchr = {https://researchr.org/publication/ManhHDNNGB23},
  cites = {0},
  citedby = {0},
  pages = {4763-4788},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023},
  editor = {Houda Bouamor and Juan Pino 0001 and Kalika Bali},
  publisher = {Association for Computational Linguistics},
  isbn = {979-8-89176-061-5},
}