Thursday, November 8, 2007
Thursday, October 18, 2007
The fonts which compliant with the above mentioned standard are (as long as I know) Myanmar2, Myanmar3, Parabike and Padauk. There may be many more with compliant with the standard. The input (keyboard) should also be in compliant with the above mentioned standard.
It can sort out the input with the order of dictionary (Burmese-Burmese Dictionary, Burmese-English Dictionary) released by Myanmar Language Commission.
With the help of my colleague, I have developed a sample software and a .dll file for sorting.
Since all the cost incurs to me, I need some money to recover the research cost. I will provide the sample software free but I will sell .dll file with some cost. .dll file can be used by the developers who wish to include the Burmese sorting in their own programs.
I will upload the sample software in a few days. ( I can't use internet at office. I upload it as soon as I can get a chance. )
For developers, please contact me to wunnakoko at gmail dot com. I will provide you the detail for the cost.
I don't know when will they open back again. This makes the advancement of NLP research to a very slow condition.
Saturday, September 22, 2007
- Word, line and sentence break
- Search and replace
- Paragraph numbering
- Character classification
- Number Formats
- Locale data
Saturday, September 15, 2007
1. Phonologic segmentation of Burmese Syllables
2. Orthographic segmentation of Burmese Syllables
3. Line Breaking or Word Wrapping
It can handle three types of documents:
1. Text Documents (.txt) files
2. XML Documents (.xml) files
3. MS Word Documents (.doc) files
By handling XML documents, we hope that it will be useful for segmenting all types of other documents like Spreadsheet files, Database files, etc.
I hope all of our friends and Burmese community will help us in testing it.
Wednesday, September 12, 2007
Syllable segmentation is the process of determination of syllable boundaries in a piece of text. Since Burmese is the tonal and analytic language and Burmese writing system is a syllabic writing system, the fundamental building blocks of a language are the syllables. In determination of syllable boundaries in Burmese Script, there can be of two types; 1) phonological boundary of a syllable, and 2) orthographic boundary of a syllable. Since Burmese script is a phonetic script, the phonological segmentation of a syllable is the basic segmentation. The phonological boundary of a syllable is defined, as the name goes, according to the phonological manner whereas the orthographic boundary of a syllable is defined according to the orthography. The orthographic syllable need not correspond exactly with a phonological syllable. The orthographic syllable is just the combination of phonological syllables and the non-breaking rules. Example: In a မႏၱေလး word, it has 3 phonological syllables, မန္, တ and ေလး. But, for orthographic syllable, it has just 2 syllables, မႏၱ and ေလး .
Thursday, September 6, 2007
A word, is a unit of language, carries meaning and consists of one or more morphemes. Although there are spaces to separate between words English, no spaces are needed to add in
So, word boundary must be detected for Myanmar Language. But, Myanmar Language is the tonal and analytic language.
A word can be formed by one or more than one syllables.
Syllable break which can be used for sorting, searching, text to speech, transliteration, can also be used for word breaking methods.Word break which can be used for spell checking, grammar checking, translation, line breaking, etc,.
Friday, August 31, 2007
w3 (World Wide Web Consortium) says as http://www.w3.org/TR/REC-CSS2/fonts.html
wiki (The free encyclopedia) says as http://en.wikipedia.org/wiki/Font
I like to let people know that font is not to sort, break, etc. The background encoding has to be used for syllable break, sorting, word breaking, etc.
In Myanmar, there are many magicians. The magicians use technology to show magic to people. So, people think it is true.
I like to give a general logic here.
1 ft= 12 x 1 inch
so 12 times of 1 in (from Myanmar) has to be the same with 1 ft (from United States). Am I right?
It is called "STANDARD". If "SORTING" is OK, all Unicode encoded text can be sorted. Why it is only for the specific "FONT" and any product. It is like 12 times of 1 in (from Myanmar) is not equal to 1 ft (from United States).
It is also the same for "BREAK" and other processes.
Thursday, August 16, 2007
Myanmar 3, the latest release of our NLP research team, is the Unicode compliant font. The main reason Myanmar 3 is reliable and Unicode compliant is that the Unicode system for Myanmar is created with us.
(Point the cursor on the following links. Right click on the mouse, choose "save target as".)
Myanmar 3 Font and Keyboard
Myanmar 3 Font Only
Monday, August 13, 2007
Natural Language Processing is a kind of computer processing specialized for Human Language, like Burmese, Karen, English, Japanese, etc. which is referred as Natural Language in computer processing circle in order to differentiate Computer Programming Langauge like, C, C++, C#, Java, VB, etc.
This blog is to discuss the latest events on Myanmar NLP and also for people's point of view on Myanmar Unicode and other encodings.
We welcome all the participants' +ve and -ve points of view and discussions.