The Research On Children's Speech Acoustic Modeling Based On Deep Learning

Posted on:2020-06-12

Degree:Master

Type:Thesis

Country:China

Candidate:G P Xu

Full Text:PDF

GTID:2428330596977930

Subject:Circuits and Systems

Abstract/Summary:

PDF Full Text Request

Children's speech recognition technology has a large potential market,but the current children-based speech recognition is very immature.On the one hand,most of the speech recognition research are focused on adults and are less concerned about the problem of children's speech recognition.On the other hand,they are inseparable from the children's pronunciation characteristics and the particularity of language expression.Based on the current problems of children's speech recognition,this thesis starts from the perspective of deep learning Chinese children's acoustic modeling,and optimizes the existing acoustic model to be applied to children's speech recognition system.Eventually,the new children's acoustic model is explored to improve the accuracy and decoding efficiency of Chinese children's speech recognition.Firstly,this paper focuses on the acoustic modeling technology of deep learning.By studying the model structure of TDNN and LSTM network,we first combine TDNN and LSTM,build a Chinese children's speech recognition system based on TDNN-LSTM,and solve the problem of decode-time in LSTM model.Using OPGRU instead of LSTM,experiments show that the acoustic model based on TDNN-OPGRU not only has better acoustic model recognition performance than TDNN-LSTM,but also 30% faster than TDNN-LSTM decoding.Further targeting the physiological characteristics of children by adding CNN in the front of the network to capture the acoustic feature information which is more conducive to children's speech recognition.Experiments show that the children's speech acoustic model of CNN-TDNN-OPGRU can achieve better than TDNN-OPGRU better performance,thus verifying the effectiveness of extracting richer children's acoustic features by adding CNN layers.Secondly,this paper firstly analyzes the model structure of feedforward memory sequence neural network(FSMN),and applies it to Chinese children's speech recognition acoustic model,and designs comparative experiments to verify that FSMN with different structures is effective for acoustic modeling of Chinese children's speech recognition.Based on the FSMN model,the performance of cFSMN and DFSMN based on deep structure cFSMN is further improved for the performance of Chinese children's speech recognition system.The analysis of experimental results shows that the error rate of children's acoustic model based on DFSMM is 25.76%.Compared to the TDNN-LSTM based acoustic acoustic model,a relative performance improvement of 1.7% can be achieved.Finally,in view of the current shortage of children's speech recognition training corpus resources,resulting in poor robustness of the recognition system,multi-task learning(MTL)and DFSMN are combined to propose a children's speech acoustic model based on MTL-DFSMN.The final experimental results show that MTL-based decoding speed of DFSMN's Chinese children's speech recognition system is more than 2 times faster than the LSTM model,and the performance is better.

Keywords/Search Tags:

Speech Recognition, Acoustic Model, Deep Learning, Multi-Task Learning

PDF Full Text Request

Related items

1	Research On Acoustic Model Of Speech Recognition In Educational Scene Based On Deep Learning
2	Research On Continuous Speech Recognition Based On Deep Learning
3	Research On Adaptation Methods In Deep Learning Based Speech Recognition Systems
4	Research On Speech Recognition Based On Deep Learning
5	Research On Chinese Speech Recognition System Based On Deep Learning
6	Research On Uyghur Speech Recognition Based On Deep Learning
7	Low-resource Speech Representation Learning And Its Applications
8	Air Traffic Control Speech Recognition Based On Deep Learning
9	Research On Acoustic Modeling In Low Resource Speech Recognition Based On Transfer Learning
10	Development Of Offline Speech Recognition System Based On Deep Learning