Font Size: a A A

Research On Phone Feature Recognition Based On Deep Learning

Posted on:2020-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:B B JiaFull Text:PDF
GTID:2438330578459499Subject:Engineering
Abstract/Summary:PDF Full Text Request
As an important way of daily communication,voice plays an irreplaceable role in human development.After the 21st century,the re-provision of neural networks and the rapid development of the Internet has pushed speech recognition to a new stage.Due to the progress of pattern recognition,speech recognition as an important component of human-computer interaction is a hot topic of current research.In the three steps of feature extraction,acoustic model recognition and decoding of speech recognition,feature extraction is especially important as the first step in speech recognition.With the successful application of deep learning in the field of speech recognition,the deep structure of neural network can realize the complex function calculation and high-dimensional learning of speech data,and extract the phoneme features with better effect than the shallow structure classification.In recent years,a large number of researchers have proposed a variety of methods to improve the recognition rate of speech features by using neural networks.Although it has achieved good results to a certain extent,there is still room for improvement in the research techniques of speech recognition.In order to reduce the recognition error rate of speech features,firstly,traditional features are extracted from speech signals,and then phoneme extraction is performed on traditional features.Secondly,tandem system is established,and the computational complexity is reduced by the shared structure of related states.At the same time,the mapping method and learning ability of the generative network in the deep learning framework are fully utilized to extract the feature parameters.Finally,the acoustic recognition error rate is obtained after learning and decoding of the acoustic model,and the effect of the model is judged by this.The main research contents of this paper are as follows:(1)In the case where the subspace Gaussian mixture model model shares the same structure in the state space,the calculation scale can be reduced by changing the mean and weight of the full parameter space.Then,global mapping from the vector space to the parameter space,through the two-step E-M algorithm in the Viterbi state of the baseline and data for alignment training,and adapting each speech state in a relatively independent space to obtain the speech phoneme characteristics after training.The traditional feature extraction is performed by using the TIMIT speech library on the open source platform.Considering the robustness of the signal during the extraction process,the traditional features are reduced in dimension before the input model.It can be seen from the experiment that the subspace Gaussian mixture model is better than the untrained traditional features in the recognition of speech phoneme features after training.(2)Establishing a tandem system of subspace Gaussian mixture and deep neural network to extract phoneme features.The traditional features are firstly subjected to dimensionality reduction analysis and then input into the first-order subspace Gaussian mixture model.Spatial sharing and reducing the parameter size of the estimated value to obtain the output feature after training,and using this output feature as the input feature of the second model,do unsupervised training in the deep neural network,and the model through the back propagation algorithm to obtain depth features;finally,the acoustic model is identified and decoded.The above experiments were carried out in the TIMIT speech library on the open source platform.Comparing the phoneme recognition error rate,it can be seen that the feature extracted by the tandem system is significantly better than the traditional features.
Keywords/Search Tags:speech recognition, subspace gaussian mixture model, acoustic feature, phoneme recognition, deep neural network
PDF Full Text Request
Related items