Outlier Detection Algorithm Based On Neighbor Difference Fluctuation And Graph Label Propagation

Posted on:2023-08-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y Deng

Full Text:PDF

GTID:2558306848967239

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Outlier detection technology is an important research branch in the field of data mining.It plays an important role in the fields of intrusion detection,fraud detection,and medical and health detection.In recent years,scholars at home and abroad have proposed many outlier detection methods.However,these methods have various shortcomings,and there are problems of poor detection precision when faced with many data sets.Therefore,this paper focus on the problems existing in the current outlier detection methods and proposes new outlier detection methods.Firstly,the outlier detection method based on the neighbor relationship has the problems of low detection accuracy of boundary area points and time-consuming calculation of internal points.This paper proposes a new method for outlier detection based on fluctuation of nearest neighbor difference factor.According to the characteristic that the number of mutual nearest neighbors of outliers is much less than the number of k-nearest neighbors,a pruning method based on nearest neighbors is presented.The concept of nearest neighbor difference is proposed to describe the distribution characteristics of data objects and their neighbors.When parameter k changes,the nearest neighbor difference of outliers and normal points will be different,and then the fluctuation of the nearest neighbor difference is used to measure the outlier degree of each data point,and then the outlier points are detected.This paper also analyzes the correctness and time complexity of the proposed algorithm.Secondly,in view of the problems of outlier detection algorithm based on graph label propagation,such as only detecting cluster outliers and low accuracy.A new method of outlier detection based on local information graph label propagation is proposed.The algorithm designs an adaptive parameter k value calculation method based on the mutual k-nearest neighbor relationship.Then use the parameter k and the neighbor relationship to build a local neighbor graph,generates a local similarity matrix and a transition probability matrix.After that,each data point is marked circularly,and each data point is propagated for a limited number of times.The convergence rate of label propagation is different between outliers and interior points,and the difference of convergence value of each point is counted.Combining the difference of convergence values with the number of mutual neighbors as outlier factor to describe the degree of outlier of data points,then outliers are detected.This paper also analyzes the correctness and time complexity of the proposed algorithm.Finally,the accuracy rate,recall rate and AUC are used as evaluation indicators.The proposed algorithm is compared with other algorithms on artificial and real data sets to verify the effectiveness of the algorithm proposed in this article.

Keywords/Search Tags:

data mining, outliers, mutual neighbors, neighbor difference fluctuation factor, label propagation

PDF Full Text Request

Related items

1	Outlier Detection Algorithm Based On Deviation Fluctuation Difference And Mutual Neighbor Weighting Factor
2	The Outliuer Detection Algorithm Based On Cluster Outlier Factor And Unique Closet Neighbor Set
3	On Multi-Label Classification Algorithms Based On Label-Specific Features And Mutual Neighbor
4	Researches On Abnormal Data Detection Algorithms With Adaptive K-Nearest Neighbor
5	Research On Instance-based Nearest Neighbor Propagation Of Partial Label Learning Algorithmand Application To Mental Disorders Data
6	Study And Improvement Of Local Outliers Mining Based On Density
7	Strategic targeting of outliers for expert review
8	Study Of Mining Outliers Based On Interestingness
9	Research On Trajectory Interest Point Mining Based On Label Propagation And Privacy Protection
10	Research On Novel Partial Label Learning Algorithms