Font Size: a A A

Research And Application Of Imbalanced Data Processing Algorithm

Posted on:2022-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:D YangFull Text:PDF
GTID:2518306557976799Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of modern information science and technology,today's data scale is showing an explosive growth.The requirements for data processing in various industries are becoming more and more urgent.However,in practical application scenarios such as medical diagnosis,information retrieval systems,fraudulent telephone detection and helicopter fault detection,the data to be processed is always imbalanced.So as to solve the problem of umbalanced data classification,the research on related processing methods has become a hot spot,which is mainly classified into two aspects,one is based on the data level research,and the other is based on the algorithm level research.Aiming at the shortcomings of the existing imbalanced data processing methods,this article proposes improved measures from the algorithm level and the data level.The related research work and innovations are as follows:(1)At the algorithm level,the thesis proposes an integrated processing method based on the K-Medoids algorithm and the En Maj Distance rule.This method takes into account that the existing integrated algorithm fails to solve the problem of class imbalance in the sample data well,and also takes into account the majority of classes.Therefore,this method combines the spatial distribution characteristics of sample data and spatial distance between samples,constructs multiple balanced data subsets through the SVC base classifier to obtain multiple base learners,and then uses the proposed integration strategy for processing.Finally,the classification problem of the second type of imbalanced data is transformed into a balanced data problem.After processing the same imbalanced data set,experiments prove that the integrated method proposed in this thesis can obtain better classification performance than the current multiple imbalanced data integration methods.(2)In addition,this thesis also proposes an under-sampling method based on the combination of feature weights and K-Medoids algorithm.This method solves the problem of classification of two types of imbalanced data from the data level.This method takes into account the feature weights of all sample data in the classification method.When combined with the K-Medoids algorithm,the majority of the data obtained by sampling is more suitable for classification decision.Specifically,the method constructs a classification model by increasing the weight value of the sample feature that plays a major role in the classification result while reducing the weight value of the sample feature that plays a secondary role in the classification decision.Experiments on the KEEL highly imbalanced data set show that this method has a better classification performance than the existing random under-sampling method,and significantly improves the classification accuracy of various types in the data set.(3)Finally,the two proposed imbalanced data processing methods are applied to real-life problems.Through the application in the processing of wine data classification,the two methods are tested on the wine data set in the UCI database.From the results of the experiment,we can conclude that compared with the other algorithms,the proposed methods can improve the classification effect,and it also shows better classification potential in the application of wine quality classification.
Keywords/Search Tags:Imbalanced Data, Imbalance Classes, Ensemble Method, Sampling Strategy, K-Medoids, Feature Weight
PDF Full Text Request
Related items