Font Size: a A A

Research On Density-based Outlier Detection In Multi-dimensional Datasets

Posted on:2021-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z X CaoFull Text:PDF
GTID:2428330602489077Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Outlier detection is one of the hot issues in the field of data mining.It plays an important role in many application scenarios,such as medical diagnosis,road monitoring,credit card fraud,network intrusion,and environmental monitoring.Existing outlier detection methods are mainly used in low-dimensional data scenarios,but with the continuous increase of dimensions,traditional outlier detection methods are affected by dimensions,which cannot effectively detect outliers,and at the same time,the efficiency of the algorithm is reduced and unable to meet the increasing demands of users.In order to detect outliers in multidimensional data,this paper compares and summarizes traditional outlier detection algorithms,selects a definition based on density,and proposes a DODMD algorithm for outlier detection in multidimensional data.In order to solve the problem of sparse data in multi-dimensional space,space-filling curves are used to map data from multi-dimensional space to low-dimensional space,and a ZH-tree index structure is constructed according to the mapped data to effectively manage multi-dimensional data.ZH-tree has two advantages:1)Its clustering attributes can effectively help search for neighbors of data objects.2)Its hierarchical structure can effectively perform spatial pruning to filter out data that cannot be neighbors.Based on the original ZH-tree,the concept of micro-cluster is introduced,each leaf node is regarded as a micro-cluster,and the calculation is performed in units of micro-cluster to achieve the purpose of batch filtering.Based on ZH-tree to detect outliers in multi-dimensional data,it includes two stages:1)After the ZH-tree construction is completed,the points are saved with larger outlierness through a greedy method,and outlierness of each point is calculated for true outliers,the smallest value is LOFmin.2)The microclusters where there is no possibility of outliers are filtered out by LOFmin.If it cannot be filtered,calculate the true outlierness of the points,and then update the result set to make the boundary closer.On this basis,a prototype system of density-based outlier detection in multi-dimensional dataset is designed and implemented,and the accuracy and efficiency of the DODMD algorithm is verified on real data sets and synthetic data sets respectively.
Keywords/Search Tags:outlier, multi-dimensional data, density-based, z-order curve, microclusters
PDF Full Text Request
Related items