Font Size: a A A

Research On Tibetan Speech Recognition Based On CNN Multi-feature Fusion

Posted on:2022-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:M M HouFull Text:PDF
GTID:2518306500456474Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the progress of science and technology and the prosperity of human-computer interaction technology,speech recognition technology has become a hot subject of research at home and abroad.At present,modern standard Chinese as the representative of the mainstream language in China has secured many good achievements in speech recognition.However,for non-mainstream languages,such as Tibetan,some relevant researches have been done by predecessors,and the final results were not ideal.Therefore,with the purpose of studying Tibetan speech recognition,this thesis focuses on Tibetan speech recognition with integrated features.The main work and innovation of this paper are as follows:1.The speech endpoint detection algorithm based on Savitzy-Golay filtering and improved sub-band energy entropy is completed.Compared with the short-time energy binding sub-band variance algorithm,the spectrum subtraction sub-band energy entropy algorithm and the improved MFCC cosine algorithm,the experiment is carried out.Experimental results show that the proposed algorithm is superior to other algorithms in the case of Gaussian white noise,Factory noise and Pink noise with different SNR,especially in the case of low SNR.For white Gaussian noise with a signal-to-noise ratio of-10 d B,the detection accuracy of this algorithm is 12.38% and 9.13% higher than that of the short-time energy binding sub-band variance algorithm and the sub-band energy entropy ratio algorithm under spectral subtraction,respectively.2.Tibetan speech recognition based on CNN acoustic model is established.The acoustical model is built by using the characteristics of 200-dimensional acoustical spectrum map and the CNN network to train the Tibetan speech corpus,and the 3-gram language model is obtained by training the Tibetan text corpus,and the corpus is randomly allocated in a certain proportion to design the cross-validation experiment of recognition.The experimental results show that the error rates of CNN-based Tibetan speech recognition words in three cross-validation experiments are 26.90%,27.19% and 26.58%,respectively.From the recognition error rate,this model is reliable for Tibetan speech recognition to a certain degree.At the same time,the experiment of whether Dropout is added to the model on the impact of recognition results is completed.The results show that the model recognition effect with Dropout is better under the same set of data,and the recognition rate is increased by 1.8%.3.The multi-feature Tibetan speech recognition based on CNN is completed.In this thesis,FBank,MFCC and spectrogram are used,and the way of feature fusion is introduced,and different comparative experiments are designed: recognition based on FBank feature,recognition based on FBank and MFCC features,recognition based on FBank and spectrogram features,recognition based on FBank,MFCC and spectrogram features.The speech recognition of Tibetan is completed by these four schemes.The results show that the recognition effect based on FBank,MFCC and spectrogram features is the best,and the word error rate is 1.28%,0.87% and 0.42% lower than the previous three schemes,respectively.
Keywords/Search Tags:Tibetan speech recognition, Endpoint detection, Convolutional neural network, Characteristics of the fusion
PDF Full Text Request
Related items