Low-resource Tibetan Multi-dialect Speech Recognitio

Posted on:2024-07-22

Degree:Master

Type:Thesis

Country:China

Candidate:Z J Dan

Full Text:PDF

GTID:2555306926984819

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Tibetan is a minority language in China with a complex phonetic and grammatical system.It is widely used in western China and there are significant differences among Tibetan dialects,which are important components of Tibetan culture.Research on Tibetan dialect speech recognition technology can help protect and inherit Tibetan culture and support the development of education,culture,and the economy in Tibetan areas.Traditional Tibetan speech recognition technology can only recognize one dialect,whereas Tibetan multi-dialect speech recognition technology can recognize multiple dialects simultaneously,providing a speech input method that is closer to the local dialect.In this study,we focus on the U-Tsang,Kham and Amdo dialects of Tibetan and explore the multi-task learning-based Tibetan multi-dialect speech recognition technology and methods.The main research content and contributions of this paper are as follows:1.In this paper,a new soft parameter sharing multi-task model is constructed for two tasks,namely,Tibetan multi-dialect speech recognition as the primary task and dialect ID recognition as the auxiliary task.This multi-task model consists of a primary task stream and an auxiliary task stream.Each module of the primary task stream is the same as speech-transformer,while the auxiliary task stream only has the convolution module and encoder of speech-transformer,and its backend module consists of fully connected layers.During training and testing,the auxiliary task stream provides high-dimensional dialect information to the primary task stream.The primary task stream passes this auxiliary information to the auxiliary cross-attention layer for processing,so that the primary task stream can use the dialect information together with the speech and text features.This will lead to stronger dialect discrimination ability of the primary task stream.The experimental results show that the model proposed in this paper has syllable recognition error rates of 7.86%,3.07%,and 29,40%for the (?)-Tsang,Amdo and Kham dialects,respectively.Compared to the single-task single-dialect model and the hard parameter sharing multi-task model,the average syllable recognition error rates on the three dialects are reduced by 30.42%and 4.82%,respectively.2.In addition,this paper also investigates a method to dynamically adjust the multi-task learning weights.In this paper,we propose an adaptive cross-entropy loss function based on task loss ratio,which adjusts the weight of each task in the total loss according to the relationship between the cross-entropy loss value of each task and the complexity of the task.To verify the effectiveness of this method,this paper compares it with manual adjustment methods and dynamic adjustment methods such as uncertainty weighting and loss change rate weighting.The experimental results show that the adaptive multi-task model proposed in this paper obtains 5.67%,2.70%and 26.48%syllable recognition error rates on the test sets of Tibetan (?)-Tsang,Amdo and Kham dialects,respectively.Compared with manual adjustment methods,there is an average absolute improvement of 1.82%on the three dialects,and compared with dynamic adjustment methods such as uncertainty weighting and loss-rate of change weighting,there is an average absolute improvement of 9.6%and 9.36%on the three dialects.Finally,the proposed model is tested on the Chinese multi-dialect dataset.Experimental results show that the model proposed in this paper is equally effective on Chinese multi-dialect datasets.The model achieves pinyin recognition error rates of 24.06%,11.22%and 51.36%in the three dialects of Mandarin,Cantonese and Hokkien,respectively.

Keywords/Search Tags:

multi-dialect speech recognition, Tibetan multi-dialect, multi-task learning, adaptive cross-entropy loss

PDF Full Text Request

Related items

1	Tibetan Multi-task And Multi-dialect Speech Recognition
2	Research On Tibetan Multi-task Learning Acoustic Model Based On DNN-HMM
3	Research On Language Recognition Based On Multi-task Neural Network
4	Research On Tibetan Speech Emotion Recognition Method Based On Multi-feature Fusio
5	Cross-Layer Interaction Network For Chinese Calligraphy Style Classification
6	The Multi-functionality Study Of "Ke" In The Cross-dialect Perspective
7	Research On Uyghur Speech Recognition Based On End-to-End Modeling
8	Research On Multi-Line Text Recognition Technology For Ancient Tibetan Documents
9	Multi-loss Siamese Convolutional Neural Network For Chinese Calligraphy Font And Style Classification
10	Multi-functional Semantics Of Onomatopoeia In Panjin Dialect Of Liaoning Province