Font Size: a A A

Diarisation And Recognition For Multilingual

Posted on:2022-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:Q SuFull Text:PDF
GTID:2518306494486804Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Artificial intelligence is developing rapidly.With more and more applications of speech in daily,international communication,and maintenance of social security,people's attention to intelligent speech technology gradually improved.Exploring the information in speech has always been a hot topic in speech study.Since the speech stream usually contains multiple languages in practical problems,the first thing is to recognize the language of the speech.After recognizing the language of the speech,we construct a recognition system to further recognize the text for the speech.This project uses the original broadcast voice data to build a broadcast database according to international standards,and conducts language recognition research on this database.In order to compare the results,we also use a public database: Oriental language database to test the language recognition system.We reproduce the traditional i-vector language recognition system,and then,replace GMM with DNN.We also analyze and compare different classifications model and explore a classification model based on the Siamese Network,in order to improve the performance of the system.After that,we build an automatic speech recognition system for minor languages.This project mainly focuses on the research of low-resource minor language: Uyghur.First,we use GMM-HMM to implement ASR system,and then on the basis of this,we use neural network such as DNN and TDNN to optimize the acoustic model.In order to solve the problem of the sparseness of the language model,this project innovatively uses an end-to-end speech recognition method to build a low-resource Uyghur ASR system.Major contributions of the project include:According to the international standards for database construction,we specify the construction standards of the broadcast voice database.First,we determine the label according to the task,and then perform manual labeling and manual review.Finally,the audio is unified to 16 k HZ,monophonic wav audio,and the labeled audio is divided into 6-10 s sentences according to speaking habits.On the basis of the traditional i-vector language recognition model,we explore the effect of the x-vector model based on TDNN on the language classification of multilingual.By analyzing and comparing the classification effect of the conventional back-end classification model,we explore a classification model based on Siamese Network.We compare the performance of the back-end classification and choose the best model for broadcast audio,The optimal model for the language recognition error rate of multilingual is 8.9%.Uyghur language is an agglutinating language,which causes the sparseness of the language model.This project proposes an end-to-end speech recognition framework and uses a small amount of data to build a speech recognition system.Its recognition rate is equivalent to an ASR system,which uses a TDNN and a language model trained on a large amount of corpus.
Keywords/Search Tags:Multilingual, Language Recognition, Uyghur, Speech Recognition
PDF Full Text Request
Related items