Font Size: a A A

Outlier Detection Algorithm Without Parameter Based On Natural Neighbor

Posted on:2021-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:G G LiFull Text:PDF
GTID:2518306107993599Subject:Engineering
Abstract/Summary:PDF Full Text Request
In some application scenarios,abnormal data(outliers)that deviate from most data are considered valuable information.These data can provide important value.For example,outlier detection technology has been widely used in financial fraud,medical aided screening,network intrusion detection and other fields.The traditional anomaly detection algorithms have two main problems: the first is the problem of too many parameters needed to input,and the second is the top-n problem,that is,the number of outliers must be determined in advance for a given data set,but actually the number of outliers cannot be known in advance when detecting anomalies in a set.In view of the above two existing problems,this thesis mainly studies the parameter-free outlier detection algorithm,and proposes two types of outlier anomalies-namely,gravitation-mass ratio and standard deviation boundary degree,and two parameter-free data anomaly detection algorithms.The first parameter-free outlier detection algorithm is to extract the core points based on the gravitation-mass ratio(abnormality factor).The algorithm first uses the natural neighbor algorithm to automatically determine the k value of the data set and extract the data points whose gravitation-mass ratios are less than the average value of gravitation-mass ratios of all the data points as core points,and then the extracted core points are clustered based on natural neighbor and depth-first search,and the data points in the clusters whose data points are less than k are set as non-core points,and finally use the inverse neighbor to assign non-core points,and the criterion for determining whether a data point is an outlier is whether its any one inverse neighbors is a core point.After continuous iteration,the remaining non-core points are the outliers,and all the outliers obtained use the gravitation-mass ratios as their abnormality factors.The second parameter-free outlier detection algorithm use the standard deviation boundary degree(abnormality factor)to extract the core point,the processing mode of the algorithm is similar to the first algorithm,the difference lies in the method of extracting the core point: all data points whose standard deviation boundary degree is less than the mean value of the standard deviation boundary value of all data extracted as the core point,and the abnormality factors of the abnormal points are determined by the standard deviation boundary degree of the abnormal points obtained.In this thesis,the natural neighbor algorithm is used to solve the problem of too many input parameters of the existing outlier detection algorithm.The extration methods of core points using gravitation-mass ratio and standard deviation boundary degree and the inverse nearest neighbors can solve top-n problem of the existing outlier detection algorithm.Compared with other seven different outlier detection algorithms on artificial data sets and real data sets through experiments,the results show that the two parameter-free outlier detection algorithms proposed in this thesis are superior on the detection accuracy index F-measure and AUC.Based on the comparison of the outlier detection algorithms,it shows that the two proposed outlier detection algorithms based on natural neighbors without parameters is effective.
Keywords/Search Tags:outlier detection, parameter-free, gravity model, inverse nearest neighbor, boundary degree
PDF Full Text Request
Related items