Font Size: a A A

An Outlier Detection Algorithm Based On Natural Nearest Neighbor

Posted on:2015-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:H TangFull Text:PDF
GTID:2298330422471693Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
k-nearest neighbor is a basic concept of neighborhood,which is widely used in thefields of data mining. k-nearest neighbor of the data object is a data subset whichformed by the its k nearest points.Recently years, k-nn has attracted the interest ofexpects and scholars, and many outlier detection algorithms based on the k-nearestneighbor have been presented. When the k-nearest neighbor method is used, it isdifficult to choose an appropriate parameter k of the algorithm which affects obviouslyits efficiency and performance. The selection of k value usually depend on the userexperience and a large number of experiments. In the k neighbor algorithm, how tochoose the suitable k value has always been a research difficulties.To avoid this problem, we propose an outlier detection algorithm based on naturalnearest neighbor (ODb3N) by means of modifying iteration stop condition. Naturalnearest neighbor (3N),is a novel concept in terms of nearest neighbor, in contrast toK-NN, its neighbors are formed in the adaptive algorithm. ODb3N is consist of twophases. In the first phase, we use the natural nearest neighbor algorithm to Look for thenearest neighbor domain of each data point. In the second phase, we studied thedifferent outlier factor of the data object’s natural nearest neighbor.The experimentsshow that our method not only has the advantage of non-parameter, but also has theability to discover both the outlier and the cluster of outliers.Main works and innovations are listed as following:①Analyzes the research background of outlier data mining and the developmentstatus and trend at home and abroad.②Introduced the typical algorithm of outlier data mining and thoughts, as well asthe specific process of data mining.③We introduced the natural neighbor technology,And modified iteration stopcondition of the original natural nearest neighbors search algorithm.Verified the stabilityof the algorithm in the randomly distributed data sets,as well as the characteristics of theautomatic clustering in the different density distributed data sets.④We propose an outlier detection algorithm based on natural nearest neighborwithout any parameters.We defined the frequency outlier factor, Local outlier factor andCluster outlier factor, then combined them to form a new standard which can More fullydescribe the characteristics of the data set. ⑤To evaluate the performance of the outlier detection algorithm,we perform anexperiment on the Artificial data sets and the real UCI data sets.The experimentalresults show this algorithm is more effective compared with the relevant algorithm.
Keywords/Search Tags:k-nearest neighbor, natural nearest neighbor, outlier detection, cluster ofoutliers
PDF Full Text Request
Related items