Font Size: a A A

Outlier Detection Algorithm Based On Neighbor Difference Fluctuation And Graph Label Propagation

Posted on:2023-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y DengFull Text:PDF
GTID:2558306848967239Subject:Engineering
Abstract/Summary:PDF Full Text Request
Outlier detection technology is an important research branch in the field of data mining.It plays an important role in the fields of intrusion detection,fraud detection,and medical and health detection.In recent years,scholars at home and abroad have proposed many outlier detection methods.However,these methods have various shortcomings,and there are problems of poor detection precision when faced with many data sets.Therefore,this paper focus on the problems existing in the current outlier detection methods and proposes new outlier detection methods.Firstly,the outlier detection method based on the neighbor relationship has the problems of low detection accuracy of boundary area points and time-consuming calculation of internal points.This paper proposes a new method for outlier detection based on fluctuation of nearest neighbor difference factor.According to the characteristic that the number of mutual nearest neighbors of outliers is much less than the number of k-nearest neighbors,a pruning method based on nearest neighbors is presented.The concept of nearest neighbor difference is proposed to describe the distribution characteristics of data objects and their neighbors.When parameter k changes,the nearest neighbor difference of outliers and normal points will be different,and then the fluctuation of the nearest neighbor difference is used to measure the outlier degree of each data point,and then the outlier points are detected.This paper also analyzes the correctness and time complexity of the proposed algorithm.Secondly,in view of the problems of outlier detection algorithm based on graph label propagation,such as only detecting cluster outliers and low accuracy.A new method of outlier detection based on local information graph label propagation is proposed.The algorithm designs an adaptive parameter k value calculation method based on the mutual k-nearest neighbor relationship.Then use the parameter k and the neighbor relationship to build a local neighbor graph,generates a local similarity matrix and a transition probability matrix.After that,each data point is marked circularly,and each data point is propagated for a limited number of times.The convergence rate of label propagation is different between outliers and interior points,and the difference of convergence value of each point is counted.Combining the difference of convergence values with the number of mutual neighbors as outlier factor to describe the degree of outlier of data points,then outliers are detected.This paper also analyzes the correctness and time complexity of the proposed algorithm.Finally,the accuracy rate,recall rate and AUC are used as evaluation indicators.The proposed algorithm is compared with other algorithms on artificial and real data sets to verify the effectiveness of the algorithm proposed in this article.
Keywords/Search Tags:data mining, outliers, mutual neighbors, neighbor difference fluctuation factor, label propagation
PDF Full Text Request
Related items