Font Size: a A A

Research On Application Of Imbalanced Learning Technology In Medical Data

Posted on:2021-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:X Y DengFull Text:PDF
GTID:2504306308975529Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of related technologies of machine learning,various fields are actively applying these technologies to their production and operation processes.However,the distribution of the samples in most of the original data is uneven.If the standard machine learning algorithm is used to classify the imbalanced data set,the final classification result will be biased.Therefore,people have proposed a series of related technologies for unbalanced learning.Based on the existing imbalanced learning technology,this paper proposes a biased random forest algorithm KS-BRAF based on sample synthesis and KS-BRAF(PSO)algorithm based on particle swarm optimization,which are proposed for the small dataset,and then applies them to imbalanced medical diagnostic data Classification task.This paper improves the method of synthesizing key regions in the original deviation random forest.In the proposed KS-BRAF algorithm,the K-means SMOTE algorithm is used to synthesize the samples and used to generate the local model.This method not only avoids the loss of information but also effectively avoids the generation of noise,thereby improving the classification effect of the model.This algorithm is compared among the relevant algorithms and the experimental results obtained on the open data sets from KEEL show that the KS-BRAF algorithm improves the effect of classifying the minority class while ensuring the high accuracy.In order to make the classification performance of the KS-BRAF algorithm more stable,this paper immediately proposes the KS-BRAF(PSO)algorithm based on particle swarm optimization.This algorithm is based on the original KS-BRAF algorithm.By combining the K-means SMOTE algorithm with random undersampling,and the particle swarm optimization algorithm is used to optimize the proportion of oversampling and undersampling respectively.Then the regional data distribution is optimized,which can improve the classification performance of the algorithm.Subsequently,comparative experiments also proved that the classification results of the optimized algorithm are better than the original algorithm.Finally,this study applies the KS-BRAF(PSO)algorithm to the classification and prediction task of the public cervical cancer data set,and compares it with the algorithms used in existing study for classification prediction of this data set.The results verify that the KS-BRAF(PSO)has better performance for classification and prediction tasks on the imbalanced medical data set and it can also play a certain role in assisting diagnosis.
Keywords/Search Tags:imbalanced learning, bias random forest, particle swarm optimization, synthetic samples
PDF Full Text Request
Related items