Font Size: a A A

Short-duration Language Identification Based On Uyghur-chinese Speech

Posted on:2021-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:X C GuoFull Text:PDF
GTID:2518306128476604Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Speech is the most efficient way of communication.And speech is also a link between various countries and different nationalities,making people communicate more convenient.In recent years,spoken language identification technology has been widely used in various fields,such as the front-end processing module of hybrid speech recognition system and speech machine translation system.And it is widely used in multilingual information services.Meanwhile,with the proposal of the Belt and Road initiative,more and more attention has been paid to Xinjiang.However there is not much research on spoken language identification system based on Xinjiang minority language.Therefore,the research objective of this work is to build a spoken language identification model with good performance for Uighur and Chinese speech under the short-duration conditionFirstly,since there is no public dataset for language identification system based on Uighur Chinese speech,a dataset based on Uighur Chinese speech for language identification is constructed in our work and it also introduces the basic information,data cleaning and preprocessing process for this dataset.In particular,an voice activity detection processing method is proposed to maximize the linguistic discrimination information.Then,based on the method of changing the frequency of voice,a data augmentation method is proposed which offsets the imbalanced distribution of male and female in the data set.Finally,the data augmentation method based on adding noise is also used in this dataset to improve the generalization ability of the model.Secondly,how to extract more discriminative features from acoustic information is the difficult subject to language identification.In this work,a Uyghur-Chinese language identification system based on GMM-ivector is proposed.Also,the parameters and implementation details of this model are determined by experiments.The effect of noise compensation technology on language identification is verified by this model.Finally,the performance of mainstream back-end classification methods,such as CDS,SVM and LDA,is compared through experiments on short-duration condition.Thirdly,aiming at the poor performance of GMM-ivector system under short-duration condition,a language identification system under deep learning is proposed based on Resnet-50,whose the performance is better than GMM-ivector system in short-duration condition through experiments.Then,aiming at the defects of resnet-50 baseline system,two improved models,Resnet-LSTM and Resnet-Attention,are proposed.The improved model is evaluated by test speech with different length.The experimental results show that the model improves the classification performance of short-duration language identification tasks for Uyghur-Chinese speech.Finally,according to the different acoustic features of speech,a combined model based on MFCC feature and pitch feature is developed.At first,the Resnet model is trained by concating pitch features with MFCC features directly,which proves the validity of pitch features in language identification task.Then,the MFCC feature and pitch feature are modeled respectively,and the fusion classification network is used to fuse two models at the back end,finally the combination model based on multi-feature is obtained.The experimental results show that the multi-feature combination model can greatly improve the classification performance of language identification task under short-duration condition for Uyghur-Chinese speech.
Keywords/Search Tags:Spoken Language Identification, I-vector, Deep learning, Resnet
PDF Full Text Request
Related items