Font Size: a A A

Research Of Speech Recognition Based On Convolution Neural Network

Posted on:2019-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2428330596462652Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The words ‘Artificial Intelligence' and ‘Deep Learning' have entered the public's view since Google's computer program AlphaGo unexpectedly defeated world human champion Lee Sedol in the first of their five-game match in March 2016.Learning is depended on the past experiences to make an informed judgment about the unknown things.Nowadays,human beings want computers have the ability to identify and judge the surroundings.The purpose of Deep Learning is to explore the neural network structure of biology,in turn to mimicking how human brain perceives external audio and video stimuli.As an important technology of human-computer interaction,speech recognition and control has become one of the important researches in the field of Artificial Intelligence.From the Hidden Markov Model to the Neural Network,the speech recognition systems improved their identification accuracy.RNN(Recurrent neural Networks),LSTM(Long Short-Term Memory)and CNN(Convolutional neural networks)are the representative network in speech recognition field.Compared with RNN and LSTM,CNN's structure is closer to the biological neural network.Because of its characteristics like Sparse Connectivity and Shared Weights make it widely used in image recognition field.The convolutional neural network is referenced to speech recognition in this article,by selecting appropriate activation function,loss function,adjusting stride and depth of convolutional layer to improve network's performance.In addition,proposed Random sparsity Dropout strategy replacing the original Dropout strategy to Improve the accuracy of recognition.The main work in this article is as follows:1.Use the Convolutional Neural Network to recognize the speech signals,deal the features with convolutional layer,to improve the speech signal strength.2.Constructed two network models based on LeNet-5,one consists of an input layer,two convolution layers,two pooling layers,a fully connected layer and an output layer with softmax and cross entropy functions.The second model added a convolution layer and a pooling layer to the first model.Then compared the two models' performance with the same voice sample set.3.Proposed Random sparsity Dropout strategy.Its core concept is randomly dropout neurons during network training instead of dropout neurons with constant proportion Dropout strategy.The experiment results show that Random sparsity Dropout strategy can improve the generalization ability of network model than Dropout strategy.4.Proposed the idea of improving the pooling layer which before the fully connected layer.This article argues that add an average pooling layer after the last max pooling layer can make features more smoothing,this is done for lessen the amount of input features to the fully connected layer.After the comparison of experimental data,it can be seen that added an average pooling layer before fully connected layer can effectively improve the network models' accuracy.
Keywords/Search Tags:speech recognition, CNN, Dropout, Average Pooling
PDF Full Text Request
Related items