Font Size: a A A

Research And System Design Of Speech Recognition Based On Improved CNN

Posted on:2020-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:L F XuFull Text:PDF
GTID:2428330596498267Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
At present,deep learning technology is changing with each passing day.Convolutional neural network is one of the research hotspots in the field of deep learning,which has achieved long-term development.Speech is the most natural way of communication between people and plays an important role in the process of human-computer interaction.Speech recognition technology,driven by convolutional neural network technology,is widely used in every field of people's daily work and life.In the thesis,speech recognition and neural network are deeply analyzed and studied,and the max similarity loss function is proposed.According to the correlation characteristics of speech signals,the prediction data and the actual data are regarded as two sequences.Combining with the theory of the longest common subsequence,the number of the longest common subsequence of the prediction data and the actual data is obtained by using dynamic programming and graphic method,and the specific elements of the longest common subsequence are obtained by using backtracking method,and the Euclidean distance between the two points is used to obtain the max similarity loss function.When updating the network parameters,the max similarity loss function is used as the loss function of the model instead of the loss function such as mean square error.The adaptive convolution kernel size algorithm is proposed based on the network updating process of convolution neural network,the gradient of convolution kernel parameters,the average value of convolution kernel parameters and Minkowski distance.This algorithm can judge whether the model has extracted enough effective information when updating the network parameters.This algorithm can dynamically change the size of convolution kernels according to the extraction situation,increase the size of convolution kernels to prevent under-fitting when effective information is not extracted and reduce the size of convolution kernels to prevent over-fitting when judging that effective information has been extracted.The thesis uses the TensorFlow framework,Librosa and MX150 GPU to build a system test platform.With the TensorFlow framework to implement the max similar loss function and the adaptive convolution kernel algorithm code,the max similarity loss function is used as the lossfunction of the convolutional neural network,and the convolutional neural network structure based on the max similarity loss function is designed.The adaptive convolution kernel algorithm is applied in the back propagation process of neural network to design a convolutional neural network based on the algorithm.The Librosa audio processing tool is used to preprocess the voice signal of the TIMIT voice data set,and the time and frequency domain information of the signal is analyzed to extract the frequency domain features,and the MX150 GPU is used to accelerate the calculation of the model.The results of the system test show that when the GPU is turned on,the convolutional neural network using the max similarity loss function can reduce the number of iterations of the network to reach a steady state,and reduce the running time by about 20% compared with the long and short memory network.The convolutional neural network using adaptive convolution kernel size algorithm can reduce the recognition error rate by about 15% compared with the long short time memory network,but it will increase the calculation time of the model.The mixed model using the max similarity loss function and the adaptive convolution kernel algorithm can improve the recognition rate by about 8% compared with the model only using max similarity loss function.It is concluded that the max similarity loss function can reduce the model calculation time and the adaptive convolution kernel algorithm can improve the model recognition rate.
Keywords/Search Tags:Convolutional Neural Network, Speech Recognition, Loss Function, Convolution Kernel
PDF Full Text Request
Related items