Research On Density-based Outlier Detection In Multi-dimensional Datasets

Posted on:2021-05-06

Degree:Master

Type:Thesis

Country:China

Candidate:Z X Cao

Full Text:PDF

GTID:2428330602489077

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Outlier detection is one of the hot issues in the field of data mining.It plays an important role in many application scenarios,such as medical diagnosis,road monitoring,credit card fraud,network intrusion,and environmental monitoring.Existing outlier detection methods are mainly used in low-dimensional data scenarios,but with the continuous increase of dimensions,traditional outlier detection methods are affected by dimensions,which cannot effectively detect outliers,and at the same time,the efficiency of the algorithm is reduced and unable to meet the increasing demands of users.In order to detect outliers in multidimensional data,this paper compares and summarizes traditional outlier detection algorithms,selects a definition based on density,and proposes a DODMD algorithm for outlier detection in multidimensional data.In order to solve the problem of sparse data in multi-dimensional space,space-filling curves are used to map data from multi-dimensional space to low-dimensional space,and a ZH-tree index structure is constructed according to the mapped data to effectively manage multi-dimensional data.ZH-tree has two advantages:1)Its clustering attributes can effectively help search for neighbors of data objects.2)Its hierarchical structure can effectively perform spatial pruning to filter out data that cannot be neighbors.Based on the original ZH-tree,the concept of micro-cluster is introduced,each leaf node is regarded as a micro-cluster,and the calculation is performed in units of micro-cluster to achieve the purpose of batch filtering.Based on ZH-tree to detect outliers in multi-dimensional data,it includes two stages:1)After the ZH-tree construction is completed,the points are saved with larger outlierness through a greedy method,and outlierness of each point is calculated for true outliers,the smallest value is LOFmin.2)The microclusters where there is no possibility of outliers are filtered out by LOFmin.If it cannot be filtered,calculate the true outlierness of the points,and then update the result set to make the boundary closer.On this basis,a prototype system of density-based outlier detection in multi-dimensional dataset is designed and implemented,and the accuracy and efficiency of the DODMD algorithm is verified on real data sets and synthetic data sets respectively.

Keywords/Search Tags:

outlier, multi-dimensional data, density-based, z-order curve, microclusters

PDF Full Text Request

Related items

1	Study On Outlier Detection Algotithm And Optimization Of Multi-dimensional And Multi-source Data
2	Research And Improvement Of Local Outlier Detecting Algorithm Based On Density
3	Research On Technology For Detecting Density-based Outlier
4	Research On Algorithm Of High Dimensional Outlier Detection
5	Research On Outlier Detection In Data Stream Based On Density
6	Data Density Based Clustering And Outlier Detection
7	Research On Unsupervised Outlier Detection Approach For Multi-dimensional Sequence Over Data Stream
8	Improvement Of Density-Based Local Outlier Detection Algorithm
9	Research And Application Outlier Detection Method Based On Density&Distance
10	Research On Algorithms For Outlier Detection