Font Size: a A A

The Research On Children's Speech Acoustic Modeling Based On Deep Learning

Posted on:2020-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:G P XuFull Text:PDF
GTID:2428330596977930Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Children's speech recognition technology has a large potential market,but the current children-based speech recognition is very immature.On the one hand,most of the speech recognition research are focused on adults and are less concerned about the problem of children's speech recognition.On the other hand,they are inseparable from the children's pronunciation characteristics and the particularity of language expression.Based on the current problems of children's speech recognition,this thesis starts from the perspective of deep learning Chinese children's acoustic modeling,and optimizes the existing acoustic model to be applied to children's speech recognition system.Eventually,the new children's acoustic model is explored to improve the accuracy and decoding efficiency of Chinese children's speech recognition.Firstly,this paper focuses on the acoustic modeling technology of deep learning.By studying the model structure of TDNN and LSTM network,we first combine TDNN and LSTM,build a Chinese children's speech recognition system based on TDNN-LSTM,and solve the problem of decode-time in LSTM model.Using OPGRU instead of LSTM,experiments show that the acoustic model based on TDNN-OPGRU not only has better acoustic model recognition performance than TDNN-LSTM,but also 30% faster than TDNN-LSTM decoding.Further targeting the physiological characteristics of children by adding CNN in the front of the network to capture the acoustic feature information which is more conducive to children's speech recognition.Experiments show that the children's speech acoustic model of CNN-TDNN-OPGRU can achieve better than TDNN-OPGRU better performance,thus verifying the effectiveness of extracting richer children's acoustic features by adding CNN layers.Secondly,this paper firstly analyzes the model structure of feedforward memory sequence neural network(FSMN),and applies it to Chinese children's speech recognition acoustic model,and designs comparative experiments to verify that FSMN with different structures is effective for acoustic modeling of Chinese children's speech recognition.Based on the FSMN model,the performance of cFSMN and DFSMN based on deep structure cFSMN is further improved for the performance of Chinese children's speech recognition system.The analysis of experimental results shows that the error rate of children's acoustic model based on DFSMM is 25.76%.Compared to the TDNN-LSTM based acoustic acoustic model,a relative performance improvement of 1.7% can be achieved.Finally,in view of the current shortage of children's speech recognition training corpus resources,resulting in poor robustness of the recognition system,multi-task learning(MTL)and DFSMN are combined to propose a children's speech acoustic model based on MTL-DFSMN.The final experimental results show that MTL-based decoding speed of DFSMN's Chinese children's speech recognition system is more than 2 times faster than the LSTM model,and the performance is better.
Keywords/Search Tags:Speech Recognition, Acoustic Model, Deep Learning, Multi-Task Learning
PDF Full Text Request
Related items