Font Size: a A A

Prototype Selection Algorithm Based On Improved Cure Clustering And Application

Posted on:2020-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y SunFull Text:PDF
GTID:2428330596479599Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
In the era of big data,how to extract useful knowledge from massive data sets has become an important issue in all fields.Aiming at the problem that K nearest neighbor classifier has too much time and space complexity in large-scale dataset,this paper adopts CURE clustering method for prototype selection,that is,CURE clustering algorithm is used to select representative samples for K-nearest neighbor classification without reducing its The classification accuracy rate is finally applied to the unbalanced data set.The specific research contents and research results of this paper are as follows:1.Improved CURE clustering algorithm based on shared nearest neighbor density and maximum and minimum distance.Two disadvantages exist for the CURE clustering algorithm:First,the noise point is difficult to determine,and a new denoising method based on shared nearest neighbor density is proposed.The method uses the shared nearest neighbor algorithm to calculate the similarity of the sample.Then,the density value of each sample is obtained,and the density threshold is obtained adaptively to eliminate the noise point.Secondly,the representative point is poorly dispersed.In this paper,the maximum and minimum distance algorithm is used to improve the selection point of the original algorithm.The proposed algorithm is compared with the traditional CURE algorithm,the literature[72]algorithm and the RTCURE algorithm in two synthetic data sets and six UCI data sets.The results show that the proposed algorithm has a certain improvement in average accuracy and operational efficiency.2.A prototype selection algorithm(PSCURE)based on improved CURE clustering is proposed.According to the improved CURE clustering algorithm in the first part,the original data set is clustered,and more representative samples are selected from each class and added to the final prototype subset for classification.Firstly,the PSCURE algorithm is used to test the synthetic dataset Pathbased and Flame people.The results show that the PSCURE algorithm can select more representative boundary points and some internal points.Secondly,the PSCURE algorithm is compared with the traditional KNN,PSC algorithm and CNN,ENN,TRKNN,BNNT and 2NMST algorithms on 10 UCI datasets.The results show that the PSCURE algorithm is the same as the traditional KNN algorithm and even higher.The classification accuracy rate and the ability to screen out fewer samples,the PSCURE algorithm not only improves the average accuracy,but also reduces the number of samples compared with the latest algorithms.3.Use the PSCURE algorithm to process unbalanced data sets.Firstly,the PSCURE algorithm is used to under-sample most of the sample in the unbalanced data set,so that the number of samples extracted is the same as that of a few classes,so that a balanced prototype set is obtained,and then KNN algorithm is used for classification.Then,through five UCI data sets,the PSCURE algorithm is compared with KNN,EDSVM and ND-SVM algorithms.The experimental results show that the PSCURE algorithm is better than other algorithms in F-measure and G-means.Finally,the PSCURE algorithm is applied to a city thief user data set and compared with the traditional KNN algorithm.The experimental results show that the PSCURE algorithm has certain advantages in solving the thief user data set.
Keywords/Search Tags:K nearest neighbor classifier, prototype selection, CURE clustering algorithm, representative point, unbalanced data set
PDF Full Text Request
Related items