Font Size: a A A

Enhancement Of K-nearest Neighbor Algorithm Based On Attribute Value Information Entropy

Posted on:2011-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:X Q TongFull Text:PDF
GTID:2178360308974015Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Classification is one of research directions of data mining. The KNN algorithm is clear and easy classification algorithm. The class label decision of the unknown sample comes from the K nearest neighbors, which is decided by the distance. The definition of the distance is directly related of the selection of the K nearest neighbors, and effects the KNN classification accuracy. Many people were absorbed in the research of the KNN distance, while few people considered the relation between the class label and the important attribute value. Information entropy can measure the classfication importantce of the attribute value. If it is less, then the attribute value is more important. In the paper, an improved algorithm Entropy-KNN based on the information entropy of an attribute value is proposed. Firstly, a distance of the two samples is defined as the average information entropy of the same attribute values. And then we decide the class label of the test sample by the average distance and the numbers on the respective class. The experiment results on the mushroom data show that our approach has much better than traditional KNN and KNN with weighted distance.Then a method based on hierarchical agglomerative clustering and Entropy-KNN is presented for the higher classification accuracy. After representative samples set of training sets are acquired based on the hierarchical cluster algorithm, the representative samples set is taken as the initial set of the Entropy KNN algorithm to further maintain. The experiment results on the mushroom show that our approach has better than Entropy-KNN algorithm.In order to the faster KNN classification, A method based on attributes reductions and Entropy-KNN is presented. After the attributes reductions are achieved, Entropy-KNN algorithm is used to evaluate classification result of the testing set .
Keywords/Search Tags:categorization, KNN algorithm, information entropy, distance
PDF Full Text Request
Related items