Font Size: a A A

An Improved KNN Method For Reducing The Amount Of Training Samples Based On Clustering And Density

Posted on:2019-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y F WangFull Text:PDF
GTID:2428330566989966Subject:Software engineering
Abstract/Summary:PDF Full Text Request
All kinds of information are gathered into a vast ocean of data,and how to quickly and accurately classify data and extract useful information from the vast ocean of data has become a major problem for people.Therefore,how to solve this problem has gradually become a hot topic of research,namely data mining.Data classification is an important part of data mining.Its main goal is to predict the types of testing samples as accurately as possible by adopting some efficient classification algorithms.The KNN algorithm is a classic classification algorithm with high accuracy,and it is simple and effective.However,classical KNN method requires large computational demands when dealing with data set which includes lots of samples,as a result,the data classification process consumes too much time.An improved algorithm is proposed in this paper.In the training stage,first,cut the training set based on the samples' density,then cluster the training set to obtain clusters,and take the centroid of the cluster as the center and the distance between the center and the sample farthest from it as the radius to transform the cluster into hyper-sphere,and get the hyper-sphere's weight value according to the proportion of the number of samples in the hyper-sphere in the total number of samples in its category.In the testing stage,two different methods based on k hyper-spheres and one hyper-sphere are designed.Among them,the method based on k hyper-spheres is focused on making the algorithm obtain higher accuracy,and the method based on one hyper-sphere is focused on making the algorithm consume less time.As the number of training samples is effectively reduced during the training stage and the distribution of the samples is improved,the computational demands in the testing stage will be greatly reduced and the accuracy will be improved.Finally,the proposed algorithm is simulated on the selected ten UCI sample sets.The experimental results show that the algorithm proposed in this paper is an effective classification method,and it has achieved good experimental results in both the classification accuracy and the classification time.
Keywords/Search Tags:clustering, density, reduce the amount of training samples, KNN
PDF Full Text Request
Related items