Research On Classification Technology For Imbalanced Data Sets

Posted on:2021-03-25

Degree:Master

Type:Thesis

Country:China

Candidate:Z Z Wang

Full Text:PDF

GTID:2428330647967274

Subject:Intelligent perception and control

Abstract/Summary:

PDF Full Text Request

In this paper,a series of researches are conducted on the problem that classification models are difficult to make efficient and accurate predictions of sample categories under the imbalanced data distribution environment.First,the classic imbalanced data set classification algorithm is analyzed and summarized,and the related knowledge and model evaluation indicators used in this paper are described in detail.Then,from the perspective of noise samples,the idea of k nearest neighbors is introduced into the recognition of noise samples,and a KNN noise sample filtering algorithm is proposed.From the perspective of oversampling,in view of the shortcomings of the oversampling algorithm,the SMOTE algorithm is improved,and an imbalanced data set classification algorithm based on improved SMOTE is proposed.Then,from the perspective of reducing algorithm running time and improving model prediction accuracy,combining the clustering algorithm and SVM algorithm,this paper proposes an imbalanced data classification algorithm based on the combination of clustering and SVM.Finally,on the basis of the foregoing work,the algorithm proposed in this paper is applied to the actual problem of human pose classification,and a human pose classification algorithm based on imbalanced data classification is proposed,and comparative experiments are performed to verify its performance.The main work done in this study is as follows:First,in order to improve the synthesis quality of samples,combined with the ideas of k nearest neighbors and clustering,an imbalanced data set classification algorithm based on improved SMOTE is proposed.On the one hand,the algorithm proposes a noise sample recognition model based on the k-nearest neighbor idea;on the other hand,it balances the sample information and guarantees the quality of the synthesized samples during oversampling.The algorithm introduces the idea of clustering to correct the synthesized samples in time.Finally,the advantages of the Ada Boost algorithm are used to perform model training on the balanced sample set.Compared with several classic imbalanced classification algorithms,the experimental results show that the algorithm has a better classification effect and stronger generalization performance.Then,from the aspects of improving classification accuracy and reducing algorithm running time,an imbalanced data classification algorithm based on the combination of clustering and SVM is proposed.The central idea of the algorithm is to under-sample the majority of samples based on the distribution characteristics of the minority of samples.Class clusters are classified according to the distribution characteristics of a small number of samples,and the definition of cluster boundaries is proposed considering the interference of noise samples.Then,in the process of constructing a balanced cluster sample set,the algorithm proposes three principles for sampling the majority of samples based on the characteristics of the samples contained in the cluster.Finally,the SVM algorithm with mixed kernel functions is selected to train the classification model in each balanced cluster sample set,and the final classification model is obtained by linear combination.Experimental verification shows that the algorithm not only effectively improves the prediction accuracy of the whole sample,but also the overall running time of the algorithm is shorter.Finally,based on the foregoing work,the algorithm proposed in this paper is applied to the practical application of human pose classification,and a human pose classification algorithm based on imbalanced data classification is proposed.A comparison experiment with four classification algorithms on the ARe M human pose data set shows that the algorithm proposed in this paper can well solve the problem of low prediction accuracy under the real human pose distribution.

Keywords/Search Tags:

K nearest neighbor algorithm, SMOTE algorithm, clustering algorithm, AdaBoost algorithm, mixed kernel function

PDF Full Text Request

Related items

1	Research On K Nearest Neighbor Algorithm Based On Class Division And Neighbor Selection
2	Nearest Neighbor Classification Improved Algorithm
3	Research On Clustering By Fast Search And Find Of Density Peaks Algorithm Base On K Nearest Neighbor Approach
4	Research On Spectral Clustering Algorithm Based On Nearest Neighbor Graph Analysis
5	Research On An Improved K-means Algorithm
6	Research On Trojan Horse Behavior Detection Technology Based On Speed-up K Nearest Neighbor Algorithm
7	Code Clone Restructuring Of C Programs Via K-Nearest Neighbor Algorithm
8	Research On Subarea Clustering Indoor Location Algorithm Based On Improved Affinity Propagation Clustering Algorithm
9	Clustering Analysis Study Based On Kernel Function
10	Improved K-nearest Neighbor Algorithm And Its Application In Text Analysis