Font Size: a A A

Research On Non-IID KNN Classification Algorithm

Posted on:2019-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:H J LiFull Text:PDF
GTID:2428330548986992Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is the process of mining valuable information from data.Classification algorithm is one of the mainstream research topic of data mining.The task of classification algorithm is to map unknown category data items to the corresponding categories by classifier.KNN(K-Nearest Neighbor)algorithm is one of the most widely used classification algorithms in data mining area.In this paper,the KNN classifier is studied and analyzed.Aiming at solving the shortcomings of the KNN algorithm,we made some improvements in the decision rule and similarity measurement.The main work of the paper are as follows:The decision rule of the traditional KNN classifier is to count the class of the k nearest neighbors after selecting neighbors,so as to predict the class labels of the tested instances.Obviously,this simple statistical discrimination method does not effectively use the information of neighboring samples.In order to overcome the shortcomings of KNN algorithm decision rules,this paper introduces the concept of Nearest-neighbor-support and Category-reliability to generate new decision rules.Firstly,by measuring the similarity between the sample to be tested and the nearest neighbor sample,we introduce the concept of Nearest-neighbor-support.Second,by considering the distribution of the sample,the Category-reliability of each category is calculated.Experiments show that the ND_KNN algorithm improves the performance of the classifier and is an effective and stable classification algorithm.When traditional KNN classification algorithms measure the relationships among objects in a data set,they often think that each object is identically and independently distributed(IID),ignoring the interactions and effects between objects.The CS_KNN algorithm is based on the idea of Non-IID.Its research focuses on mining the interaction relations among the characteristics,categories and attribute-values of objects.Firstly,by measuring the importance of each feature on the classification,we study the Non-IID of features and categories to form the weight coefficient of the class feature.Second,we use the weight coefficient of the class feature to form the intra-object non independent and identically distribution function between the objects.Then,we analyze the effects of different features and generate non independent and identically distributed functions.Finally,the Non-IID relations among features,inside the features,and categories among objects are fused into similarity measures to construct association similarity rules.Experiments show that the CS_KNN algorithm based on the Non-IID idea has significantly improved the classification effect compared with the traditional KNN algorithm.
Keywords/Search Tags:KNN algorithm, classification, decision rule, Non-IID, similarity
PDF Full Text Request
Related items