Adapting the Tesseract Open-Source OCR Engine for Tamil and Sinhala Legacy Fonts and Creating a Parallel Corpus for Tamil-Sinhala-English

Charangan Vasantharajan, Laksika Tharmalingam, Uthayasanker Thayasivam. Adapting the Tesseract Open-Source OCR Engine for Tamil and Sinhala Legacy Fonts and Creating a Parallel Corpus for Tamil-Sinhala-English. In Rong Tong, Yanfeng Lu, Minghui Dong, Wengao Gong, Haizhou Li 0001, editors, International Conference on Asian Language Processing, IALP 2022, Singapore, October 27-28, 2022. pages 143-149, IEEE, 2022. [doi]

Abstract

Abstract is missing.