Font Size: a A A

Speech Recognition Method Research Based On Convolution Neural Network

Posted on:2017-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:X G ZhangFull Text:PDF
GTID:2308330491454688Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
As an unsupervised learning method based on artificial neural network, deep learning is a new hybrid machine learning model which rose in recent years. Some of the learning models based on the deep learning has been gradually applied to the field of speech recognition and achieved some amazing results in recent years. In addition, the deep network model training requires a lot of computing resource. High-performance processing platform of POWER8 architecture provides strong support for the massive data processing in the era of big data, particularly, it is equipped with powerful floating-point unit and parallel multi-threading technology, which very consistent with the requirements of processing speech and graphical data by the neural network computing model. Based on the POWER8 architecture computing platform, this paper adopts deep convolution neural network to process speech feature data. Experiments show that the method can achieve better speech recognition.This paper replaced the traditional acoustic model namely Gaussian mixture model with convolution neural network (CNN) model relying on POWER8 which is a high-performance processing platform architecture.In order to better convolution neural network model used in modeling speech model, this paper optimize the CNN model from the following two aspects:(1)Aiming at the problem for existing pooling algorithms ignoring locally relevant characteristics of the speech date, resulting in the low critical speech feature extraction efficency, a dynamic adaptive pooling(DA-Pooling) algorithm based on POWER architecture is proposed. In pooling layer CNN model, replacing the original pooling algorithm with DA-Pooling. DA-Pooling algorithm extracting local adjacent acoustic characteristic data, calculating the Spearman correlation coefficient of the extracted data to determine data correlation, making appropriate the pool strategy for different correlativity of data according to weight. This method improves the ability to adapt of pooling layers for different data pooling layers of data correlation.(2) In order to solve the existing convolutional neural network in dealing with speech data sets generalization capability is not high, and due to traditional Dropout using hidden neurons node trick, resulting in a key node weight information loss problem. In a fully connected CNN layer model, the paper added Dropout strategy based on sparsity. The trick added a node sparseness determination mechanism node In the output stage of neuron node. The output values of node activation function into the sparsity discriminant function to obtain the current node sparse level (ie, the probability that the node is hidden), then if the node is hidden obey Bernoulli distribution as parameters of hidden probability. This method can reduce the ratio of influence of smaller nodes in the model results by sparsity, thereby improving the generalization ability of the model.
Keywords/Search Tags:Convolutional neural network, POWER8, Speech recognition, Overfitting
PDF Full Text Request
Related items