Font Size: a A A

Research On Multi-Sources Acoustic Recognition Oriented To The Deaf Via Deep Learning

Posted on:2018-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:J HanFull Text:PDF
GTID:2348330521950906Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of big data,how to deal with massive multi-mode data on has become the focus of the researchers now.One of the most notable applications is the image and voice data,these applications describe all aspects of people's lives,and are increasingly relevant with people's requirements.At the same time,in recent years deep learning in image processing,natural language processing and other aspects of the effective application of data processing technology has opened up a new world.Based on the deep learning,this thesis deals with the multi – environment data for the deaf,and proposes an environment-independent acousticrecognition model.This thesis first describes a kind of feature extraction of voice based on acoustic processing algorithm-Mel Frequency Cepstrum Coefficient.And this algorithm is used as a data preprocessing part of the recognition model,and the influence of each parameter in the algorithm on the recognition accuracy is analyzed in detail.Secondly,a supervised recognition model(convolution neural network)and a semi-supervised recognition model(restricted boltzman machine-support vector machine)are proposed for different types of collected voice data.In the case of a large number labels in voice data,the deep convolution neural network is used to extract the location-independent features of acoustic data,and the average accuracy is more than 85%.At the same time,considering the reduction of cloud platform computing resources,this thesis also proposes a model compression algorithm based on convolution neural network.Experiments show that the compression algorithm can compress the original size of this model to one percent of the original size,but its recognition accuracy has not decreased.In the case of only a small number of manual annotation for voice data,this thesis proposes a restricted boltzman machines-support vector machine framework.Firstly using unlabeled sound data to train restricted boltzman machines to extract location-independent feature,then using these features and labeled data to train the support vector machine to complete the semi-supervised recognition model framework.In this framework,the accuracy of the acoustic event recognition has exceeded 80%,which is beyond the baseline 75%.In the experimental part,the recognition accuracy of these two models is compared with other existing recognition algorithms respectively.The accuracy outperforms the shallow learning based approaches e.g Adaboost,Random Forest,Multi-Layer neural network 70%,65%,68% respectively.The experimental data have shown that the recognition accuracy of these two algorithms is far beyond the basic machine learning algorithms.What's more,this thesis also analyzes the event misjudgment rate,anti-noise ability,convergence ability and parameter adjustment of these two models.Finally,the data of the original data and the feature extracting from convolutional neural network are visualized.It is found that the convolution neural network does have the ability to extract location-independent feature from different environments.It also proves the defects and problems of the existing machine learning algorithms from another aspect.
Keywords/Search Tags:Location-independent, Convolutional Neural Network, Restricted Boltzmann Machines
PDF Full Text Request
Related items