A word, is a unit of language, carries meaning and consists of one or more morphemes. Although there are spaces to separate between words English, no spaces are needed to add in
So, word boundary must be detected for Myanmar Language. But, Myanmar Language is the tonal and analytic language.
A word can be formed by one or more than one syllables.
Syllable break which can be used for sorting, searching, text to speech, transliteration, can also be used for word breaking methods.
Word break which can be used for spell checking, grammar checking, translation, line breaking, etc,.
3 comments:
This process is also known in NLP community as Tokenization.
Here is my Word Breaker, for your reference.
http://www.burglish.com/wbreaker.htm
mark@burglish.com
Great point! Did you develop any Burmese word segment tool?
Post a Comment