Font Size: a A A

Research On Optimization Of English Speech Recognition Algorithm Based On Deep Neural Network

Posted on:2019-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y TaoFull Text:PDF
GTID:2428330593450557Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Currently,research on speech recognition has been increasingly deepened,and the approach taken is gradually diversified.At present,most speech signal analysis utilizes the spectral features of speech signals,extracts features through a series of transformations,and finally trains the recognition model.However,thinking from another aspect,as the most intuitive expression of speech signals,the spectrogram not only contains the spectral information,but also includes the fundamental frequency,formants and their changing trends.The texture composed of these trends can represent the pronunciation feature information such as the pitch and accent of the speech.Experienced phonetics experts can even estimate the meaning of the text they represent using only one spectrogram.Based on the above conditions,the speech signal's spectrogram can be used as an input for feature extraction.Using the more mature feature extraction and model training methods in the field of image processing,the spectrogram is processed to explore a new field of combining voice signal processing and image processing.This paper studies the classification of pronunciation pros and cons based on Pulse Coupled Neural Network(PCNN),and explores the feasibility of using the spectrogram as a feature of speech recognition model.600 positive and negative samples of pronunciation standard degrees were collected,and the isolated word spectrum image generated by the short-time Fourier transform was used as the feature input.Then the MFCC features are further integrated at the feature level and the result level,and fed into the support vector machine classifier to classify the pros and cons.Experimental results show that using the PCNN can achieve more than 85% recognition accuracy when the feature contains a spectrogram,which is more accurate than just using spectral features.When the image features are integrated with the spectral features of the speech,a better recognition accuracy can be obtained,and the fusion method based on the voting revaluation of the recognition results is superior to the fusion method based on the feature input.In short,it is feasible to use the spectrogram as a breakthrough point for extracting model features.Non-specific person's pronunciation evaluation methods combining spectrogram and convolutional neural network are studied.A feature preprocessing method that combines wideband spectrogram and narrow-band spectrogram is proposed.The narrow-band spectral maps are used for fundamental frequency and harmonic analysis to complete the endpoint detection and eliminate invalid non-voice segments.Wideband spectrograms are used to separate different textures to achieve phonemelevel segmentation and create tagged data in phonemic units.The segmentation accuracy of this strategy is about 88%.Then the processed two-dimensional feature matrixes are fed into the seven-layer convolution neural network for training.Experiments show that the recognition accuracy of the convolutional neural network for phonemic spectrogram is generally good,and there is a positive correlation between the actual accuracy and the segmentation preprocessing effect.Different phonemes can achieve different recognition results due to their own pronunciation characteristics.The overall recognition accuracy of all phonemes is close to 83%.
Keywords/Search Tags:Speech Recognition, Spectrogram, Deep Neural Network, Convolutional Neural Network
PDF Full Text Request
Related items