Font Size: a A A

Research On Local Outlier Detection Algorithm Based On Subspace

Posted on:2021-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y TangFull Text:PDF
GTID:2428330602993905Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Local outlier detection is one of the hotspot issues in data mining.With the rapid development of information technology,the dimension of data sets keeps increasing,and high-dimensional data can be seen everywhere.The high-dimensional data is sparsely distributed in high-dimension data space,the data objects tend to be evenly distributed,which result in the outliers being hidden in the high-dimension data space.Traditional outlier detection algorithms are not easy to detect outliers in high-dimensional data.Subspace can be regarded as a low-dimensional projection of the full-dimensional space.How to find the subspace related to outlier information and perform outlier detection in the subspace is the main research direction of local outlier detection on high-dimensional data.The existing outlier detection algorithms have been analyzed and studied,and the two steps of high dimensional outlier detection are studied respectively,which are subspace selection and traditional outlier detection algorithm.The main contents are as follows:(1)A local outlier detection algorithm based on local estimated density is proposed for local outlier detection.First,the kernel density estimation method is used to calculate the local estimated density of data objects,and the bandwidth can be adjusted according to the neighborhood sparsity.Then,use the average local estimated density of the neighborhood of the data object and its own local estimated density to calculate the local outlier factor.Finally,the local outlier factor is compared with the given threshold value.If the outlier factor of.a data object is greater than the threshold,the data object is considered to be an outlier.Experimental results show that the algorithm is effective for local outlier detection.(2)A dimension-based subspace selection algorithm is proposed for global space processing of high-dimensional data.First,uses the deviation function based on the cumulative entropy as the quality function of subspace to measure whether the subspace is suitable for outlier detection.Secondly,the algorithm constructs the optimal subspace based on the quality function of subspace for each property,which has the largest subspace quality relative to the property.(3)For high dimensional outlier detection,a local outlier detection algorithm based on subspace is proposed by combining the dimension-based subspace selection algorithm with the local outlier detection algorithm based on local estimated density.First,use the dimension-based subspace selection algorithm to find a subspace set.Then,for each data object in data set,calculated the outlier factor respectively on the subspace in subspace set,and calculate the average of the outlier factor as the outlier score of this data object.Finally,the outlier score is compared with the given threshold value.If the outlier score of a data object is greater than the threshold,the data object is considered to be an outlier.Experimental results show that this method can improve the performance of traditional outlier detection algorithm on high-dimensional data.For high dimensional outlier detection,this algorithm can detect local outliers as much as possible.
Keywords/Search Tags:Outlier detection, Subspace, Outlier Score, Kernel Density Estimation
PDF Full Text Request
Related items