Font Size: a A A

Research On Speech Phoneme Recognition Based On Deep Learning

Posted on:2021-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:C J QinFull Text:PDF
GTID:2518306038486894Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Speech recognition technology is the process of converting speech into text.It involves multiple interdisciplinary subjects such as acoustics,signal processing,and pattern recognition.In speech recognition research,extracting deep features that can represent signals,establishing a powerful recognition model,and using accurate and effective decoding algorithms are all important to improve the recognition rate.The traditional acoustic model is composed of a Gaussian mixture model and a hidden Markov model.With the maturity of deep learning technology and the improvement of the performance of computing devices,acoustic models have evolved into discriminant models based on neural networks;With the maturity of deep learning technology and the improvement of the performance of computing devices,speech recognition technology gradually develops into a hybrid system based on neural networks and hidden Markov models.The end-to-end speech recognition technology only uses neural networks to achieve unified training of acoustic models,speech models,and decoding modules,it simplifies the constituent modules of traditional speech recognition systems.In this thesis,in order to reduce the error rate of speech recognition.First,from the perspective of signal feature optimization processing,by studying traditional optimization methods and feature connection fusion methods,proposes a deep feature fusion method based on sub-network model.Finally,the thesis conducts phoneme recognition study based on the connectionist temporal classification algorithm on the TIMIT database.The model is established by using convolutional neural network and recurrent neural network.The specific research content of the thesis is as follows:(1)Based on deep neural network,the bottleneck feature extraction network is trained and used to extract the speech bottleneck features.Linear discrimination analysis is used to enhance the phoneme category information in the bottleneck features.And then,the features are linearly transformed with maximum likelihood to enable the model to adapt to speaker information,and improved robustness to speaker noise;studied the impact of bottleneck characteristics obtained by adjusting network structures on system performance.(2)In order to enhance the diversity of input information,the thesis studies the application of the feature fusion method in the phoneme recognition task.Firstly,the tandem feature fusion method widely used in pattern recognition is studied.At the same time,the ability of neural networks to extract signal features and integrate information is proposed.The thesis proposed a sub-network-based feature fusion method:using subnetworks to extract deep features based on traditional features,the feature fusion network establishes the connection between deep features and performs fusion learning.(3)The thesis conducts phoneme recognition study based on the connectionist temporal classification algorithm on the TIMIT database.By using a hybrid network consisting of a convolutional neural network and a recurrent neural network as the feature learning structure of the model.Using the convolutional neural network to extract the local stable characteristics of the speech signal,and the output of the convolutional neural network is used as the input of the recurrent neural network.Then,using the recurrent neural network models the time-correlation information of the speech signal.By rationally designing the network structure to avoid the phenomenon of over-fitting due to the small data set and complex model structure.
Keywords/Search Tags:speech recognition, phoneme recognition, end-to-end speech recognition system, neural network models, feature fusion
PDF Full Text Request
Related items