Font Size: a A A

Research On Local Outlier Detection Technology

Posted on:2021-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:F T QinFull Text:PDF
GTID:2518306047488114Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Outlier detection technology has become a research hot spot in the field of data mining.In the traditional local outlier detection algorithm,the performance of mining local outliers based on the density method is better,but it only considers the distance in calculating the deviation of the data object and its neighbors,and often ignores the angle.With the increase of data scale and data dimension,the method of detecting outliers based on the difference between the data object and its neighbors becomes no longer applicable.Because there are some unrelated attributes in the high-dimensional data sets,the deviation of outliers from other inliers cannot be reflected in the full-dimensional space,so that outliers cannot be detected.In addition,this irrelevant dimension will affect the efficiency of outlier detection and reduce the accuracy of outlier mining.To make up for the above shortcomings,this paper conducts in-depth research on outlier detection algorithms for data of different sizes,and the main work is as follows.A novel outlier detection method is proposed based on local similarity measurement for common-scale data sets.The similarity of object to its neighbors is referenced to detect outliers.This method takes into account distance and angle simultaneously to acquire a measure of similarity.First,in order to better describe the similarity between objects,the Gaussian kernel function is improved and named as local flexible Gaussian kernel similarity.Second,the adjusted similarity by the mean of the attribute vectors is used to represent the angle-based similarity.A local similarity-based outlier score algorithm is ultimately obtained in coupling with local flexible Gaussian kernel similarity and the angle-based similarity to indicate the abnormality of object.The larger the outlier score of the object,the higher the probability that the object is an outlier.The comprehensive experiments on synthetic and real-life data sets demonstrate the effectiveness and the accuracy of the proposed method.Aiming at the problem that outliers cannot be detected due to irrelevant attributes and redundant data in the high-dimensional data sets,a new sparse subspace-based algorithm for local outlier detection is proposed.Firstly,the outlier factor of the object is defined according to the local density of the object in each dimension.Secondly,the attributes unrelated to local outliers and redundant objects in the data sets are reduced based on the threshold of outlier factor.Finally,the improved particle swarm optimization algorithm is used to search sparse subspace in the simplified data sets,and the local outliers are included in the sparse subspace.The effectiveness and feasibility of the proposed algorithm are demonstrated by the comprehensive experiments on synthetic and real-life data sets.
Keywords/Search Tags:Outlier detection, Flexible Gaussian kernel similarity, Adjusted cosine similarity, Data reduction, Particle swarm optimization, Sparse subspace
PDF Full Text Request
Related items