Computational Linguistics: Low Resources Language Tone Classification

Research Project Overview and Description
This project focuses on the intersection of linguistics and deep learning to address the challenges of low-resource language processing. Specifically, the research investigates tone classification for languages with limited digital and annotated data, which often hinders the development of effective natural language processing (NLP) models. By utilizing GPU-accelerated computational modeling, the team aims to develop methodologies for language identification and phonological analysis that can improve the preservation and digital usability of underrepresented languages. The initiative contributes to maintaining global linguistic diversity by applying machine learning strategies to overcome data sparseness and enhance the accuracy of linguistic knowledge transfer.
This project investigates the intersection of linguistics and deep learning to address the challenges of processing low-resource languages. By utilizing GPU-accelerated computational modeling and deep learning techniques, the research focuses on tone classification for languages with limited digital and annotated data. The initiative aims to develop methodologies for language identification and phonological analysis to improve the preservation, digital usability, and accuracy of natural language processing (NLP) models for underrepresented languages. This work contributes to maintaining global linguistic diversity by applying machine learning strategies to overcome data sparseness and enhance the accuracy of linguistic documentation.

Research Outcome
The project is expected to deliver advanced computational methodologies for tone classification in low-resource languages by integrating linguistics with GPU-accelerated deep learning. Key outcomes include the development of robust language identification and phonological analysis tools that improve the accuracy and digital usability of natural language processing (NLP) models for underrepresented languages. These deliverables will provide a technological framework to overcome data scarcity, helping to preserve global linguistic diversity and facilitate the inclusion of these languages in modern speech technology. The research is anticipated to result in peer-reviewed publications and serve as a foundation for future advancements in inclusive AI and speech recognition.

About the reseachers

Qisheng Liao is a PhD student in the Department of Linguistics at the University of Hong Kong (2025–2029). With a background in computer science and deep learning from New York University and the University of California, Santa Cruz, his research focuses on the intersection of linguistics and deep learning, specifically in phonology and natural language processing tasks such as language identification and text-to-speech.

Dr. Youngah Do is an Associate Professor in the Department of Linguistics at HKU, holding a PhD from MIT. Her research examines language learning and learnability through experimental and computational modeling, with recent work focusing on the preservation and inclusivity of Hong Kong Sign Language.

Fund Source
N/A

For enquiries

please contact at atlabhku.hk