Font Size: a A A

Research On Speech Recognition Technology In Low Resource Environment

Posted on:2018-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:F ShuFull Text:PDF
GTID:2348330563451200Subject:Military Intelligence
Abstract/Summary:PDF Full Text Request
The current speech recognition technology relies on a large amount of data resources,and the speech recognition system performance will be significantly reduced in low-resource environment.There are about 6900 languages in the world,only a few languages(such as English,Mandarin,etc.)have sufficient data resources,most languages are low-resource.With the development of economic globalization,the application of speech recognition technology is no longer limited to high-resource languages such as English,Chinese Mandarin.How to build a high-performance speech recognition system in low-resource environment has become an international research hotspot and difficult problem.This paper mainly focuses on the research of speech recognition technology in low-resource environment.The main work is as follows:(1)This paper presents LSTM-RNN based a low-resource speech recognition acoustic modeling method.In the low-resource environment,it is very important to make full use of the information contained in the speech signal.The GMM,SGMM and DNN models are limited by the fixed window length and can only model the data in limited time of the window.Therefore,this paper proposes a LSTM-RNN with long-term information modeling capability for low-resource speech recognition acoustic modeling.On the basis of this,we add sequential discrimination training,and use the timing information to assist the model training,and adjust the parameters.The method was experimented on the OpenKWS16 evaluation corpus.The experimental results show that the LSTM-RNN based low-resource speech recognition acoustic modeling method has better performance than the traditional method in low-resource environment.The system reduces the word error rate(WER)by 4.4 percentage points in the continuous speech recognition task,and increases the overall actual term weight value(ATWV)by 0.0241.(2)A low-resource speech recognition method based on representation sharing and transferring and training data augmentation is proposed.In the low-resource environment,it is very difficult to obtain a large number of transcripted training audio data,we can only borrow data in other languages or mining available training data in target language as a supplement to training data.Based on the idea of representation sharing and transferring in DNN,this paper uses a variety of data in other languages to train a SHL-MDNN and extracts MBN features for low-resource speech recognition with it.In addition,this paper proposes two different strategies to mine available training data in target language to achieve data augmentation.The audio data perturbation method perturbs the audio data in the existing data set,and the disturbed audio is semantically consistent with the original transcript text,and can be added as a new data into the training data set.The semi-supervised training method uses the ASR system to recognize the un-transcripted data which is easy to obtain,and the recognition result is used as the transcript text of the data,and they are added into the training set together.In this paper,the validity of the method is verified by experiments,and the WER of low-resource speech recognition system to which representation sharing and transferring and training data augmentation is applied is 3.8 percentage points lower than the baseline system,and the overall ATWV is improved by 0.0323.In addition,the LSTM-RNN acoustics modeling method and the method are combined to analyze the performance of each method.The system achieves the optimal performance,when the method is integrated,of which the WER gets reduced by 7.2 percentage points relatively and the ATWV is improved by 0.0582.(3)A complement finite state transducer(FST)based pronunciation lexicon expansion method is proposed.Lexicon is a significant part of ASR system.Lack of lexicon resource and vocabulary will result in high OOV rates and degrade the performance of speech recognition system.In this paper,we proposed a novel method to automatically expand the lexicon,which recover OOVs from the pronunciations without large text corpus to discover new words.Firstly,the complement forms of FST expression of the lexicon and P2 G conversion are used to get new word-pronunciation pairs.Then,a two-stage verification strategy,namely pronunciations verification and words verification,is utilized to filter the errors.Finally,the learned new words are incorporated into the LM by adopting linear interpolation of the base LM and a new LM trained with the crawled texts.There is significant reduction of OOV rates after the lexicon expanding.The WER have been improved with a relative gain of about 9% for English and 2.3% for Czech over the baseline systems,and the ATWV improves 9.7% for English and 10.0% for Czech.
Keywords/Search Tags:low-resource speech recognition, long short-term memory, representation sharing and transferring, lexicon expansion
PDF Full Text Request
Related items