Font Size: a A A

Pathological Speech Recognition Based On Ensemble Learning And Fusion Features

Posted on:2019-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:D LiFull Text:PDF
GTID:2348330569979530Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The dysarthria caused by certain diseases can cause serious damage to the language ability of patients,making it difficult to send clear and smooth speech in daily communication,and thus affect their lives.Faced with this situation,researchers in related fields have begun to focus on the detection and recognition of pathological speech,but for a long time they can only rely on the doctor's subjective judgment and invasive diagnosis.Therefore,the pathological speech recognition based on objective and non-invasive means is currently the most ideal solution.However,most existing pathological speech recognition systems only use a single type of features and a single classifier.In actual use,there are cases where the recognition effect is not ideal and the generalization performance is poor,which greatly increases the probability of misdiagnosis.In view of the above reasons,this paper makes the following research based on ensemble learning and fusion features:(1)A new pathological speech recognition system was proposed.In the construction of the recognition network,the bagging sampling algorithm and the decision tree algorithm are used to form a random forest.The purpose is to improve the speed and accuracy of pathological speech recognition for dysarthria.At the same time,a new fusion feature FFPM based on prosodic features and MFCC is introduced.This feature not only has good performance in expressing the fluency,tone and rhythm of speech,but also has the advantage of MFCC in representing human auditory characteristics.This paper combines the new fusion features with the random forest algorithm,and the final recognitionsystem can significantly improve the recognition accuracy of pathological speech,providing a new method for the application of pathological speech recognition based on machine learning technology in the actual medical detection.(2)Based on TORGO pathological speech database,the paper combine the acoustical features of speech with machine learning algorithms for experiments to explore the optimal combination of features and classifiers in pathological speech recognition.In order to achieve this goal,several sets of comparative experiments based on different sex of subjects,different characteristics and different recognition networks were conducted.First,separate the male and female data for experimentation,and test the recognition rate of various combinations and the differences between male and female speech.Experiments have shown that no matter which type of recognition network is chosen,the recognition accuracy when using the FFPM feature is higher than when using the other two single features,and the recognition rate for the female speech is slightly lower than that of the male.Then the gender factor was removed,experiments were conducted only for two different types of corpus.The experimental results showed that the pathological speech recognition system combined with the FFPM features and the random forest achieved the highest recognition accuracy.Among them,the classification accurate of male speech reached 99.21%.The classification accuracy of female speech reached 98.97%,and the comprehensive classification accuracy rate reached 98.00%.At the same time,the study also found that compared to restricted sentences,the patient's pronunciation of short words is more accurate.(3)Still based on ensemble learning,the PCA method is used to perform feature transformation on the extracted sample subsets to enhance the difference between the base classifiers and further construct a rotating forest algorithm.On this basis,the cost-sensitive idea was introduced into the rotation forest for the non-equilibrium data problems often encountered in pathological speechresearch.That is,when constructing a decision tree,the information cost function is used as the attribute splitting criterion,and the misclassification cost and the test cost are taken into consideration simultaneously,which constitutes a cost-sensitive rotation forest algorithm,and then the superiority is verified through designed experiments.In this paper,mild patient-normal and mild-severe patient data sets were extracted from the TORGO database.The feature parameters were a mixture of FFPM and nonlinear features,and multiple sets of experiments were performed based on the cost-sensitive rotation forest algorithm.The experimental results show that cost-sensitive rotating forest can significantly improve the classification accuracy of a minority while keeping the overall recognition rate.In practical medical tests,the method proposed in this paper is helpful to reduce the cost of misdiagnosis and improve the diagnostic accuracy.
Keywords/Search Tags:pathological speech recognition, random forest, fusion feature, rotation forest, cost-sensitive
PDF Full Text Request
Related items