Digitizing the Minority Language Documents in Vietnam by Using Unicode

Digitizing the Minority Language Documents in Vietnam by Using Unicode

  IJETT-book-cover           
  
© 2021 by IJETT Journal
Volume-69 Issue-10
Year of Publication : 2021
Authors : Hoang Thi My Le
DOI :  10.14445/22315381/IJETT-V69I10P226

How to Cite?

Hoang Thi My Le, "Digitizing the Minority Language Documents in Vietnam by Using Unicode," International Journal of Engineering Trends and Technology, vol. 69, no. 10, pp. 204-210, 2021. Crossref, https://doi.org/10.14445/22315381/IJETT-V69I10P226

Abstract
Using its own fonts in documents of Vietnamese ethnic minority languages is a major obstacle for digitization to develop information systems. Therefore, Vietnamese ethnic minority language documents face difficulties in displaying, storing, processing, and exchanging on the internet or between computers that do not have the same font. These difficulties have affected the digitization to develop the information system of ethnic minority areas in Vietnam. In order to overcome the above difficulties, the paper proposes a solution for encoding the Unicode character sets of ethnic minority languages in Vietnam. This solution is applied in language processing of the Ede ethnic minority in Vietnam, specifically: using Unicode font in documents and converting documents using own fonts to Unicode fonts.

Reference
[1] H.K. Phan, Building grammar to process text. Applying for Vietnam ethnic languages, in Proc. Anniversary of the Founding of the Vietnam Academy of Science and Technology, 1976-2006, (2006).
[2] P. Baker, A. Hardie, T. McEnery, and et al., 11A 67-Million Word Corpus of Indic Languages: Data Collection, Markup, and Harmonisatio, in: Proc. The LREC-Language Resources and Evaluation Conferences, (2002) 819–825.
[3] B. Williams, M.L. Forcada, K. Sarasola, 6th SaLTMiL Workshop on: Collaboration: interoperability between people in the creation of language resources for less-resourced languages, in: Proc. SALTMIL, Morocco, (2008).
[4] K. Sarasola, F.M. Tyers, M.L. Forcada, 7th SaLTMiL Workshop on: Creation and use of basic lexical resources for lessresourced languages, in: Proc. SALTMIL, Malta,( 2010).
[5] M.L. Forcada, G.D Pauw, G.M. Schryver, K. Sarasola, F.M. Tyers, P.W. Wagacha, Language technology for normalisation of less resourced languages, in: Proc. SALTMIL, Turkey, (2012).
[6] S. Rob, The Unicode Standard, Mountain View Publishing, 2016.
[7] Unicode http://vi.wikipedia.org/wiki/Unicode. K.C. Le, Researching on Vietnam ethnic minority languages, Aboriginal Education World, Taiwan, 2013
[8] T.D. Tran, Researching on languages of ethnic minorities in Vietnam, Hanoi National University Publishing, (1999).
[9] K.Q. Truong, X.D. Tran, Cham language processing-Building English-Vietnamese-Cham multilingual text editing system¸Thesis of Information Technology Engineer, The University of Danang, (2003).
[10] Using the software of Nom, Thai and Cham language character sets, http://tintuc.hues.vn/dua-vao-ung-dung-phan-mem-chunomchu- thai-chu-cham, (2012).
[11] H.T.M. Le, Building a Ede computer information processing system in text editor, Master thesis in Computer Science, (2002).
[12] D.K. Nguyen, TayNguyenKey - Supporting program for typing character set of ethnic minorities in the Central Highland, Dak Lak Department of Education, http://c3quangtrung.daklak.edu.vn/tainguyen/ taynguyenkey-chuong-trinh-ho-tro-go-chu-cac-dan-tocthieu- so-tay-nguyen.
[13] Q. Huy, N. Thuy, Bringing ethnic minority languages to VnKey’s typing tool, https://www.vietnamplus.vn/dua-ngon-ngu-dan-tocthieu- so-len-bo-go-vnkey/80904.vnp.
[14] S.Cam, The problem of Thai language font and typing tool, http://learntaidam.blogspot.com/2012/05/van-e-bo-go-vafonttieng- thai.html, (2006).
[15] T.D. Tran, Researching on ethnic minority languages in Vietnam, Hanoi National University Publishing, (1999).
[16] T.B. Tran, Vietnamese & multilingual Keyboard Driver for Windows, http://winvnkey.sourceforge.net.
[17] T.M.L. Hoang, V. Souksan, H.K. Phan, Using Unicode in Encoding the Vietnamese Ethnic Minority Languages, Applying for the Ede Language, in. Proc. The International Conference on Knowledge and System Engineering, Hanoi,Vietnam, (2013) 137– 148.
[18] Y ?ang Niê Siêng, Y ?ô? Mlô, Ede language, DakLak Department of Education, (2007).
[19] Y.N. Kdam, Ede language book-1, Vietnam Education Publishing, (2013).
[20] Y.N. Kdam, Ede language book-2, Vietnam Education Publishing, (2013).
[21] Y.N. Kdam, Ede language book-3, Vietnam Education Publishing, (2013).
[22] DakLak Department of Education and Training, Ede language Grammar, Education Publishing, (2011).
[23] VOV4 ethnic radio system, Vietnames Broadcast, http://vov4.vov.vn/Ede.aspx.

Keywords
Unicode, Encoding, natural language processing, minority language processing, Unicode font.