Machine Learning for High-Quality Tokenization Replicating Variable Tokenization Schemes

Murhaf Fares, Stephan Oepen, Yi Zhang. Machine Learning for High-Quality Tokenization Replicating Variable Tokenization Schemes. In Alexander F. Gelbukh, editor, Computational Linguistics and Intelligent Text Processing - 14th International Conference, CICLing 2013, Samos, Greece, March 24-30, 2013, Proceedings, Part I. Volume 7816 of Lecture Notes in Computer Science, pages 231-244, Springer, 2013. [doi]