Adapting the Tesseract Open-Source OCR Engine for Tamil and Sinhala Legacy Fonts and Creating a Parallel Corpus for Tamil-Sinhala-English

Charangan Vasantharajan, Laksika Tharmalingam, Uthayasanker Thayasivam. Adapting the Tesseract Open-Source OCR Engine for Tamil and Sinhala Legacy Fonts and Creating a Parallel Corpus for Tamil-Sinhala-English. In Rong Tong, Yanfeng Lu, Minghui Dong, Wengao Gong, Haizhou Li 0001, editors, International Conference on Asian Language Processing, IALP 2022, Singapore, October 27-28, 2022. pages 143-149, IEEE, 2022. [doi]

Authors

Charangan Vasantharajan

This author has not been identified. Look up 'Charangan Vasantharajan' in Google

Laksika Tharmalingam

This author has not been identified. Look up 'Laksika Tharmalingam' in Google

Uthayasanker Thayasivam

This author has not been identified. Look up 'Uthayasanker Thayasivam' in Google