Our research group belongs to the Computational Linguistics Center at the University of Science, VNU-HCM, an interdisciplinary research center bridging the fields of linguistics and computer science, specializing in natural language processing for Vietnamese and related languages by constructing and leveraging relevant language resources. In particular, our team specializes in the extraction of resources from ancient texts written in Han-Nom script through tasks such as Optical Character Recognition (OCR), phonetic transcription from Han-Nom to Latin script, and semantic translation into contemporary Vietnamese. We have published over 200 papers in domestic and international journals and conference proceedings related to the aforementioned research areas. The results of our software have been practically applied and can be accessed at our website: https://tools.clc.hcmus.edu.vn/.

Study topic

  • Language resources
  • Word Processing


  • Assoc. Prof. Dr. Dinh Dien
  • Dr. Nguyen Hong Buu Long
  • Dr. Luong An Vinh
  • Dr. Nguyen Thi Nhu Diep
  • Dr. Nguyen Tuyet Nhung
  • MSc. Le Thi Thuy Hang
  • MSc. Thai Hoang Lam
  • Student. Duong Thi An

Typical research topics

  • Automatically convert Nom text into Quoc Ngu script (project at HCMC level)
  • Korean - Vietnamese machine translation (in cooperation with SYSTRAN)
  • OALD English - English - Vietnamese Dictionary (in partnership with Oxford University Press (OUP))
  • Kim Tu Dien electronic dictionary (in cooperation with Kim Tu Dien company)

Possible collaborative activities

Building and exploiting dictionaries, Vietnamese corpus, bilingual/parallel multilingual corpus, word processing tools (Text classification, Text similarity, Spell check, Grammar check Methods, Text Difficulty Assessment, Text Style, Psychological Analysis in Text, Automatic Translation, Text Summary, Text/Opinion Mining, Plagiarism Detection...), applications to teach Vietnamese to foreigners, teach foreign languages to Vietnamese, software for the blind...


  • Link researches information: www.clc.hcmus.edu.vn
  • Website translates: https://tools.clc.hcmus.edu.vn/

Scientific publications (typical)

  1. Duc Huu Trinh, Trinh Le-Phuong Ngo, Long H.B. Nguyen and Dien Dinh

    “Applying Cross-view Training for Dependency Parsing in Vietnamese”, ICIC Express Letters, Part B: Applications, Volume 13, Number 3, March 2022 (SCOPUS)

  2. Binh Le, Binh Nguyen, Long Nguyen and Dien Dinh

    “PhraseAttn: Dynamic Slot Capsule Networks for Phrase Representation in Neural Machine Translation”, Journal of Intelligent & Fuzzy Systems, vol. 42, no. 4, pp. 3871-3878, 2022 (SCI-E)

  3. Long Nguyen, Nghi Pham, Duc Le, Duy Vu, Dien Dinh

    “Moment Matching Training for Neural Machine Translation: An Empirical Study”, Journal of Intelligent and Fuzzy Systems, vol. 43, no. 3, pp. 2633-2645, 2022 (SCI-E)

  4. Dien Dinh & Nguyen Le Thanh

    “Vietnamese Sentence Paraphrase Identification using Pre-trained Model and Linguistic Knowledge”, International Journal of Advanced Computer Science and Applications(IJACSA), Volume 12 Issue 8. http://dx.doi.org/10.14569/IJACSA.2021.0120891 (ESCI)

  5. Long Hong Buu Nguyen, Viet H. Pham, Dien Dinh

    “Improving Neural Machine Translation with AMR Semantic Graphs”, Mathematical Problems in Engineering, vol. 2021, Article ID 9939389, 12 pages, 2021. https://doi.org/10.1155/2021/9939389 (SCI-E)

  6. Tu Dinh Tran, Minh Nhat Ha, Long Hong Buu Nguyen & Dien Dinh

    “Improving Multi-Grained Named Entity Recognition with BERT and Focal Loss”, ICIC Express Letters, Part B: Applications, Volume 12, Number 3, March 2021, DOI: 10.24507/icicelb.12.01.92 (SCOPUS)

  7. Dien Dinh, Phuong Nguyen & Long Hong Buu Nguyen

    “Transliterating Nôm Scripts into Vietnamese National Scripts using Statistical Machine Translation” International Journal of Advanced Computer Science and Applications(IJACSA), 12(2), 2021. (ESCI)

  8. Long Hong Buu Nguyen, Hung Duong Minh, Dien Dinh & Thanh Le Manh

    “Improving Neural Machine Translation with POS Tags”, ICIC Express Letters, Part B: Applications, Volume 12, Number 1, January 2021, DOI: 10.24507/icicelb.12.01.91 (SCOPUS)

  9. Long Nguyen, Viet Pham, Hung Minh, Dien Dinh & Thanh Manh

    “Integrating AMR Semantic Graphs to Convolutional Neural Machine Translation”, ICIC Express Letters, Part B: Applications, Volume 12, Number 2, January 2021, DOI: 10.24507/icicelb.12.02.133 (SCOPUS)

  10. Nhi-Thao Tran, Minh-Quoc Nghiem, Nhung Thi Hong Nguyen, Ngan Luu-Thuy Nguyen, Nam Van Chi & Dien Dinh

    “ViMs: a high-quality Vietnamese dataset for abstractive multi-document summarization”, Lang Resources & Evaluation, vol. 54, no. 4, pp. 893–920, Dec. 2020, doi: 10.1007/s10579-020-09495-4 (SCI-E)

  11. An-Vinh Luong, Diep Nguyen & Dien Dinh

    “Building a Corpus for Vietnamese Text Readability Assessment in The Literature Domain”, Universal Journal of Educational Research, 8(10), 4996 - 5004. DOI: 10.13189/ujer.2020.081073 (SCOPUS)

  12. An-Vinh Luong, Diep Nguyen, Dien Dinh & Thuy Bui

    “Assessing Vietnamese Text Readability using Multi-Level Linguistic Features”, International Journal of Advanced Computer Science and Applications (IJACSA), 11(8), 100–111. http://dx.doi.org/10.14569/IJACSA.2020.0110814 (ESCI)

  13. An-Vinh Luong, Diep Nguyen & Dien Dinh

    “Examining the Part-of-speech Features in Assessing the Readability of Vietnamese Texts”, Acta Linguistica Asiatica, 10(2), 127–142. https://doi.org/10.4312/ala.10.2.127-142 (SCOPUS)

  14. Long Ly, Quang Nguyen, Long Hong Buu Nguyen & Dinh Dien

    “Integrating Structural Dependencies in Neural Machine Translation Using Graph Convolutional Networks”, ICIC Express Letters. Part B, Applications : An International Journal of Research and Surveys, 10(12), 1067–1075. https://doi.org/10.24507/icicelb.10.12.1067 (SCOPUS)

  15. Le Thanh Nguyen & Dinh Dien

    “English–Vietnamese cross-language paraphrase identification using hybrid feature classes”, Journal of Heuristics. doi:10.1007/s10732-019-09411-2 (SCI-E)

  16. Le Ngoc Tan, Sadat Fatiha, Menard Lucie & Dinh Dien

    “Low-Resource Machine Transliteration Using Recurrent Neural Networks”, ACM Transactions on Asian and Low-Resource Language Information Processing, 18(2), 1–14. doi:10.1145/3265752 (SCI-E)

  17. Điệp Nguyễn, An-Vinh Lương & Điền Đinh

    “Affection of the part of speech elements in Vietnamese text readability”, Acta Linguistica Asiatica, 9(1), 105-118. https://doi.org/10.4312/ala.9.1.105-118 (SCOPUS)

  18. Phuoc Tran, Dien Dinh, Tấn Lê & Long Hong Buu Nguyen

    “Linguistic-Relationships-Based Approach for Improving Word Alignment”, ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 2017. DOI: 10.1145/3133323 (SCI-E)

  19. Phuoc Tran, Dien Dinh & Hien Thanh Nguyen

    “Improving Word Alignment Based on Named Entity”, International Journal of Innovative Computing, Information and Control – ICIC Express Letters, Part B: Applications, Volume 8, Issue 7, July 2017. DOI: 10.24507/icicelb.08.07.1121 (SCOPUS)

  20. Phuoc Tran, Dien Dinh & Long Hong Buu Nguyen

    “Word Re-Segmentation in Chinese-Vietnamese Machine Translation”, ACM Trans. Asian Low-Resour. Lang. Inf. Process. 16, 2, Article 12 (November 2016), 22 pages. DOI: https://doi.org/10.1145/2988237 (SCI-E)

  21. Long Hong Buu Nguyen, Dien Dinh & Phuoc Tran

    “An Approach to Construct a Named Entity Annotated English-Vietnamese Bilingual Corpus”, ACM Trans. Asian Low-Resour. Lang. Inf. Process. 16, 2, Article 9 (October 2016), 17 pages. DOI: https://doi.org/10.1145/2990191 (SCI-E)

  22. Phuoc Tran, Dien Dinh & Hien Nguyen

    “A Character-Level-Based and Word-Level-Based Approach for Chinese-Vietnamese Machine Translation”, Computational Intelligence and Neuroscience, Volume 2016 (2016), Article ID 9821608, DOI: 10.1155/2016/9821608 (SCI-E)