Font Size: a A A

Classification Optimization Method For Unbalanced Sample

Posted on:2019-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y YouFull Text:PDF
GTID:2417330563493062Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of machine learning,data mining,deep learning and other fields,people are committed to constantly optimizing the learning effects of various models.As the main task of data mining,classification problems have been widely concerned.The traditional classification algorithms are mostly based on insensitive balanced data.They tend to focus on the overall accuracy of a classifier,and this classifier is not suitable for unbalanced sample data.Therefore,how to improve classification strategies to optimize the performance of unbalanced samples is a worthwhile discussion.Aiming at maintaining the overall performance of classifier,this paper aims to optimize classifier's ability to classify a few samples.At present,there are two main research directions.One is to study how to balance the number of samples without losing the information contained in the original sample,and to avoid the noise production as much as possible.Secondly,the algorithm is used to train a single class sample classifier,introduce a cost sensitive factor,and adopt ensemble learning and other ways to improve the performance of the classifier.Before the optimization method is proposed,a new compound evaluation criterion IIBA? is proposed.It is proved by theory and experiment that it can focus on the classification of a few classes of samples under the premise of maintaining the overall performance of the classifier.It has the advantages of higher robustness and noise resistance,and the evaluation criterion is adopted in the comparison of the optimization effects.After that,this paper introduces two aspects of data and algorithm strategy to optimize the problem of unbalanced data classification,and the optimization of sample is the core of this paper.This paper presents a framework of combinatorial data-driven methods for balancing the class distributions based on clustering.This method can be compatible with various existing sample balance techniques.At the same time,the clustering performance evaluation is introduced,and the number difference strategy is set up according to the clustering effect.A cluster sample of cluster output can be paid more attention to side weight.At the same time,the weight of the sample of the classified boundary is improved.Finally,a sampling strategy based on cluster center is proposed.After the experiment on the ideal data set and actual data sets,the IIBA? is used as the evaluation criterion and some existing algorithm strategies are combined to prove the lifting of the combinatorial data-driven methods for balancing the class distributions based on clustering method proposed in this paper for the balanced sample,and the classification performance of the classifier is optimized.
Keywords/Search Tags:Unbalanced sample, Classification optimization, Clustering, Performance measures, Balanced data
PDF Full Text Request
Related items