Font Size: a A A

Improvement Of KNN Algorithm Based On Weighted Data Partition And Imbalanced Data Set

Posted on:2018-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z F CaoFull Text:PDF
GTID:2428330542467090Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Twenty-first Century is a century of rapid development of information technology.People live in a world full of all kinds of information every day.How to classify massive data and get useful information from it is an important problem to be solved urgently.Data mining domain classification technology can be a good solution to this problem.K nearest neighbor algorithm is a classical classification algorithm,but the traditional K nearest neighbor algorithm still has some defects in the face of large amounts of data,mainly the low efficiency of the algorithm,and the problem of low classification accuracy in imbalanced data sets.Aiming at the efficiency of the algorithm,this paper proposes an improved classification algorithm based on the idea of data partition,including BINER algorithm and CLUEKR algorithm.BINER is a data division algorithm based on dichotomy.It divides the data set into multiple sub data sets,then calculates the similarity between sub datasets,selects the corresponding sub datasets according to the size of the similarity,so as to reduce the query data set and improve the efficiency of the algorithm.CLUEKR is a data classification algorithm based on hierarchical clustering,the data set of top-down hierarchical clustering,the data set is divided into multiple sub cluster,also according to the similarity between the sub cluster,select sub clusters to meet the conditions,reduce the data set,to improve the efficiency of the algorithm.These algorithms do not deal with the classification data directly,but reduce the query data set effectively by data partition,and finally reduce the running time of the algorithm,and improve the efficiency of the algorithm.In view of the low accuracy of K nearest neighbor algorithm on imbalanced datasets,a weighted K nearest neighbor algorithm is proposed in this paper.The weighted design of minority classes in imbalanced data sets is implemented,which reduces the influence caused by the imbalance of minority classes and improves the accuracy of the algorithm.The weighted design methods are mainly as follows:simple weighted design,weighted design of enhancement factor and weighted design of adding correction factor.Through the above design,the weights are assigned to each class,and it is ensured that the assigned weights do not adversely affect the outliers.Finally,the classification accuracy of K nearest neighbor algorithm on imbalanced datasets is improved.Finally,this paper combines the improved weighted K nearest neighbor algorithm with the CLUEKR algorithm,and designs an efficient and accurate K nearest neighbor algorithm,which is called CW-KNN algorithm,which takes into account the nature of data.
Keywords/Search Tags:Classification algorithm, K nearest neighbor algorithm, data partition, imbalanced data set
PDF Full Text Request
Related items