Font Size: a A A

Research On Outlier Detection Method Based On Nearest Neighborhood

Posted on:2020-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:X L YangFull Text:PDF
GTID:2428330572496982Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Outlier detection is an important research topic in data mining.It can be used to detect data objects that are significantly different from the other data objects in a data set,and has important research value in theory and broad prospects in application.However,most of the outlier detection methods are designed only for detecting single outlier type and condition that the data set owns only one kind of data attribute type.For data set with abnormal data distribution,and different type of outliers as well as mixed data attribute type not enough attention was paid to it.Actually,there are a large number of such data sets exist in real life.It is very important to study such kinds of data sets in outlier detection which owns practical significance.It this thesis,the main outlier detection algorithms are summaried and basic concepts of neighborhood and outlier detection are discussed.For limitation existed in current outlier detection algoritions which are used to process data set with complex distribution,multi-type outliers and mixed attribute structure,detection methods based on nearest neighborhood have been analyzed and studied deeply,and some research results are shown as follows:(1)A new algorithm of reversed k-nearest neighborhood based on relative distance is proposed for the difficulty of outlier detection of data sets with complex distribution and various outlier types.Firstly,the classical Euclidean distance,the local density of the object and the object neighborhood are combined to define the relative distance of the object,which can be used to detect both global and local outliers.Secondly,based on the minimum spanning tree structure,largest edge cut method is used to obtain outliers and outlier clusters.Finally,the artificial synthesis and UCI dataset experiments show that the new algorithm has higher detection accuracy.So,an effective new way for outlier detection of data sets with abnormal distribution and diverse outliers is provided.(2)Aiming at the difficulty to deal with continuous attribute data by the outlier detection method based on traditional rough set,an outlier detection algorithm,NRMFOD,based on neighborhood rough membership function is proposed.The applicable data includes continuous,discrete and hybrid data sets.Based on the mixed distance and adaptive radius,neighborhood membership function is defined to describe the object's outlier degree.Finally,neighborhood outlier factors are constructed to implement the outlier detection.According to comparative experiments of UCI data,algorithm NRMFOD is effective.
Keywords/Search Tags:data mining, outlier detection, nearest neighborhood, relative distance, neighborhood rough
PDF Full Text Request
Related items