Font Size: a A A

Study On Parkinson Speech Data Mining Method Based On Sample And Feature Learning

Posted on:2019-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:L Y YangFull Text:PDF
GTID:2428330566976583Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Parkinson's Disease(PD)is a degenerative disease of the human central nervous system.The disease development of most PD patients can be effectively delayed or even stopped if they can be diagnosed and intervened at the early stage of the disease.Therefore,the early non-invasive diagnosis of PD has great clinical significance.The research of PD classification based on speech data mining has attracted people's attention in recent years.It has the advantages of noninvasive,fast,remote,cost-effective and convenient,so it has become a hot and difficult point in the world.Sample and feature learning are important parts of PD classification algorithm,but the existing research still has the following key problems to be solved.1)PD speech samples are obtained by collecting a variety of speech fragments,some speech samples can not represent the essential difference between patients and healthy people well,and random noise will be introduced into the sample collection,which will affect the performance of classifier.Therefore,how to select optimal samples to achieve satisfactory classification accuracy and stability is a key problem.The existing methods seldom consider the influence of sample selection on PD classification,which has a significant impact on the improvement of the classification performance.2)The features of Parkinson's speech samples are obtained based on the priori knowledge of the pathologist,which have obvious physical meaning.However,there is great redundancy between the features,and the ability to characterize PD is not satisfactory.It is necessary to study an efficient feature transformation method to obtain the high-level feature with strong classification ability.At present,most studies do not consider the nonlinear relationship between features,and the class characterization ability is limited.In order to solve the above problems,this thesis explores the research of the sample learning based on Classification and Regression tree(CART),the feature learning based on Deep Belief Network(DBN),in order to propose new PD speech data mining methods,and improve the classification accuracy.The main contents are listed as follows:(1)A PD speech data classification method based on CART sample optimization is proposed.First,based on the PD speech data sample,the Gini index is used as the evaluation index to obtain the best segmentation feature and the segmentation value,which makes the data set less uncertain,and then constructs the left and right subtrees in turn.Secondly,in order to prevent the model overfitting,the model complexity is controlled by adjusting the number of CART leaf nodes to realize the optimal performance of the model.Finally,the class with largest number of samples in the leaf node is taken as the leaf node category,and the samples of other classed are excluded.Then the best sample set is obtained to realize the optimal selection of the samples.(2)On the basis of the first work,a method of PD speech data classification based on CART and ensemble learning is proposed.Firstly,based on the CART algorithm the best sample set is selected as the basis for the subsequent model training.Secondly,the Random Forests(RF),Support Vector Machine(SVM)and Extreme Learning Machine(ELM)are taken as the base classifier for modeling of the new sample set and predicting the label for test set respectively.Finally,the voting method is applied to realize the decision level fusion of the three base classifiers,and the final classification result is obtained.(3)A PD speech data classification method based on DBN feature learning is proposed.First,the DBN feature extraction network is constructed,and the output of each Restricted Boltzmann Machine(RBM)is taken as the new reconfiguration feature.Secondly,the original feature set is sent into the DBN network,and each RBM is trained by the contrastive divergence algorithm,the network parameters is fitted and the input features are nonlinear combined,and the output of RBM is taken as the optimal feature set.Finally,the RF algorithm is used to classify new feature set.This study helps to reveal the correlation between sample and feature learning and the classification of PD speech data.It provides a theoretical basis and method basis for the study of sample learning based on CART and the classification of PD speech data based on DBN.It is of great theoretical significance and practical value to promote the diagnosis of PD speech data and avoid the inconvenience and safety hazards of high-risk groups.
Keywords/Search Tags:Parkinson's Disease (PD), Sample learning, Feature learning, Ensemble learning, Deep belief network(DBN)
PDF Full Text Request
Related items