An Outlier Detection Algorithm Based On Natural Nearest Neighbor

Posted on:2015-07-17

Degree:Master

Type:Thesis

Country:China

Candidate:H Tang

Full Text:PDF

GTID:2298330422471693

Subject:Computer software and theory

Abstract/Summary:

k-nearest neighbor is a basic concept of neighborhood,which is widely used in thefields of data mining. k-nearest neighbor of the data object is a data subset whichformed by the its k nearest points.Recently years, k-nn has attracted the interest ofexpects and scholars, and many outlier detection algorithms based on the k-nearestneighbor have been presented. When the k-nearest neighbor method is used, it isdifficult to choose an appropriate parameter k of the algorithm which affects obviouslyits efficiency and performance. The selection of k value usually depend on the userexperience and a large number of experiments. In the k neighbor algorithm, how tochoose the suitable k value has always been a research difficulties.To avoid this problem, we propose an outlier detection algorithm based on naturalnearest neighbor (ODb3N) by means of modifying iteration stop condition. Naturalnearest neighbor (3N),is a novel concept in terms of nearest neighbor, in contrast toK-NN, its neighbors are formed in the adaptive algorithm. ODb3N is consist of twophases. In the first phase, we use the natural nearest neighbor algorithm to Look for thenearest neighbor domain of each data point. In the second phase, we studied thedifferent outlier factor of the data objectâ€™s natural nearest neighbor.The experimentsshow that our method not only has the advantage of non-parameter, but also has theability to discover both the outlier and the cluster of outliers.Main works and innovations are listed as following:â‘ Analyzes the research background of outlier data mining and the developmentstatus and trend at home and abroad.â‘¡Introduced the typical algorithm of outlier data mining and thoughts, as well asthe specific process of data mining.â‘¢We introduced the natural neighbor technology,And modified iteration stopcondition of the original natural nearest neighbors search algorithm.Verified the stabilityof the algorithm in the randomly distributed data sets,as well as the characteristics of theautomatic clustering in the different density distributed data sets.â‘£We propose an outlier detection algorithm based on natural nearest neighborwithout any parameters.We defined the frequency outlier factor, Local outlier factor andCluster outlier factor, then combined them to form a new standard which can More fullydescribe the characteristics of the data set. â‘¤To evaluate the performance of the outlier detection algorithm,we perform anexperiment on the Artificial data sets and the real UCI data sets.The experimentalresults show this algorithm is more effective compared with the relevant algorithm.

Keywords/Search Tags:

k-nearest neighbor, natural nearest neighbor, outlier detection, cluster ofoutliers

Related items

1	Natural Neighbor:The Concepts And Applications In Data Mining
2	Study On Generalized Nearest Neighbor Pattern Classification
3	Outlier Detection Algorithm And Its Parallelization Based On Weighted K-Nearest Neighbor
4	Study Of Outlier Detecting Algorithm Based On Natural Nearest Neighbor And Weighted Attribute Entropy
5	Study On Classification Algorithm Based On Natural Nearest Neighbor
6	Outlier Detection Algorithm Without Parameter Based On Natural Neighbor
7	Researches On Abnormal Data Detection Algorithms With Adaptive K-Nearest Neighbor
8	Research On The Visual Group K-Nearest Neighbor And Group Inverse K-Nearest Neighbor Query Of Multi-Source Objects In Three-Dimensional Space
9	Research On Continuous Nearest Neighbor Query
10	Research And Application Of Outlier Detection Method Based On Nearest Neighbor