An Improved KNN Method For Reducing The Amount Of Training Samples Based On Clustering And Density

Posted on:2019-06-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Wang

Full Text:PDF

GTID:2428330566989966

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

All kinds of information are gathered into a vast ocean of data,and how to quickly and accurately classify data and extract useful information from the vast ocean of data has become a major problem for people.Therefore,how to solve this problem has gradually become a hot topic of research,namely data mining.Data classification is an important part of data mining.Its main goal is to predict the types of testing samples as accurately as possible by adopting some efficient classification algorithms.The KNN algorithm is a classic classification algorithm with high accuracy,and it is simple and effective.However,classical KNN method requires large computational demands when dealing with data set which includes lots of samples,as a result,the data classification process consumes too much time.An improved algorithm is proposed in this paper.In the training stage,first,cut the training set based on the samples' density,then cluster the training set to obtain clusters,and take the centroid of the cluster as the center and the distance between the center and the sample farthest from it as the radius to transform the cluster into hyper-sphere,and get the hyper-sphere's weight value according to the proportion of the number of samples in the hyper-sphere in the total number of samples in its category.In the testing stage,two different methods based on k hyper-spheres and one hyper-sphere are designed.Among them,the method based on k hyper-spheres is focused on making the algorithm obtain higher accuracy,and the method based on one hyper-sphere is focused on making the algorithm consume less time.As the number of training samples is effectively reduced during the training stage and the distribution of the samples is improved,the computational demands in the testing stage will be greatly reduced and the accuracy will be improved.Finally,the proposed algorithm is simulated on the selected ten UCI sample sets.The experimental results show that the algorithm proposed in this paper is an effective classification method,and it has achieved good experimental results in both the classification accuracy and the classification time.

Keywords/Search Tags:

clustering, density, reduce the amount of training samples, KNN

PDF Full Text Request

Related items

1	Research On Blocking Fuzzy Clustering Algorithm Based On Density Of Samples
2	Research Of Improvement To The Density-based Method For Reducing The Amount Of Training Data And Application To KNN
3	Research On Face Recognition Algorithm Based On Pixel Mapping To Construct Virtual Samples
4	Research On Improvement Of K-means Clustering Algorithm
5	Application Of A Small Amount Of Labeled Samples Support Vector Machine Classification
6	Research On Trajectory Clustering Algorithms Of Moving Objects
7	Research On Three-Way Clustering Method For Incomplete Data
8	Reseach On Adaptive Target Detection With Small Amount Of Training Data
9	Research On Parallel Clustering Algorithm Based On Map-Reduce
10	Research On Density Peaks Clustering