Font Size: a A A

Research On Oversampling Algorithm Based On Angle And Direction Clustering

Posted on:2022-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:S J QinFull Text:PDF
GTID:2518306731987919Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Oversampling is a data-level method for imbalanced learning.SMOTE is one of the classic oversampling algorithms widely used in real world imbalanced classification problems.SMOTE has the defect of blindness in selecting the nearest minority samples.Therefore,combining clustering algorithms is a solution.However,there is still a bottleneck in existing clustering-based oversampling algorithms,such as insufficient consideration of the boundary samples,the quality of the synthetic samples cannot be guaranteed,and the inappropriate weight of the clusters.In order to find different forms of clusters in imbalanced datasets more accurately,we propose a clustering algorithm based on angle and direction,and design a new oversampling algorithm by combining this algorithm with SMOTE,which can effectively improve the performance of imbalanced classification.The main work and innovation of this paper are as follows:(1)Angle and Direction Based Clustering(ADBC)algorithm.The DBC algorithm is difficult to detect clusters with different adjacent densities,it lacks the consideration of outliers,and it is difficult to set the maximum deviation angle.Therefore,the ADBC algorithm filters out the outliers by calculating the angular variance of the points and selecting the most trusted and densest neighbors of each point.Moreover,the trusted and densest neighbors are the reverse nearest neighbors of the point.Fisher's optimal segmentation method was used to partition the neighborhood and obtain the set of trusted neighbors adaptively.So the reliable direction of label transfer is found out and the clustering is completed by label transfer.The ADBC algorithm is verified by experiments based on the clustering effect and parameter sensitivity.The results show that our algorithm can effectively detect clusters with different densities,the parameter setting is simple,the sensitivity is weak,and the clustering results are not easily affected by the parameters.(2)Angle and Direction Based SMOTE(ADSMOTE)algorithm.The existing clustering-based oversampling algorithm,in view of the neglect of boundary samples,the low quality synthetic sample and unreasonable weight assignment cluster,ADSMOTE algorithm cluster the dataset by ADBC algorithm,and investigates the distribution of the boundary of points by ABOF value.Next,the concept of sorting neighbors,root interpolation weight,auxiliary interpolation weight and best interpolation neighborhood is proposed.Then the weight of cluster integration is assigned by combining the weight of some samples in the cluster.In the real imbalanced dataset,the ADSMOTE algorithms are trained and learned in classifiers.The results show that the pre-treatment of ADSMOTE algorithm on the training set can improve the performance of F1 score and AUC compared to other algorithms.
Keywords/Search Tags:Clustering, Oversampling, Angle, Direction, Imbalanced learning
PDF Full Text Request
Related items