Font Size: a A A

Research And Application Outlier Detection Method Based On Density&Distance

Posted on:2020-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:H J LiuFull Text:PDF
GTID:2428330596979687Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Outlier detection is a key research direction in data mining.The main task of outlier detection is to find objects that are different from most of the object generation mechanisms.Currently,the outlier detection algorithm has been deeply studied,but the outlier detection method using a single outlier factor is difficult to improve the accuracy of the detection algorithm.Therefore,integrated outlier factor is used to detect outliers has become an important research direction in the field of outlier detection analysis.This article introduces the concept of outlier detection in detail that includes definition of outliers,causes of the formation of outliers and classification of outliers.Meanwhile,this paper also introduces the classification of outlier detection algorithms.After we analyzed the advantages and disadvantages of the existing outlier detection methods,two new outlier detection algorithms are proposed.(1)A new outlier detection algorithm based on density and distance double parameters outlier factor,named DDPOS,is proposed.It is difficult to improve the accuracy of detection algorithm by using a single density-based outlier detection algorithm or distance-based outlier detection algorithm,and the boundary point interference is not well eliminated.Based on this,we propose an outlier detection algorithm based on an integrated outlier factor that is composed of double parameters of density and distance.Firstly,this algorithm observes the tightness of objects by calculating the local density of objects.And then it use the global distance between the objects to eliminate interference between the boundary points.DDPOS combines the neighbor algorithm framework to complete outlier detection.Theoretical analysis and experimental results show that DDPOS can effectively detect outliers in spatial distribution data.(2)A new outlier detection algorithm based on candidate set partitioning,named CPOFS,is proposed.We analyzed the DBS CAN algorithm and found that its preprocessing step can quickly divide the dataset into two sets:core point set and candidate set.Experimental results for most datasets,the partitioned candidate set can contain all outliers.Then the outliers can be detected for the candidate set,and the operating efficiency of algorithm can be improved.The algorithm uses the I-DBSCAN algorithm to divide the candidate set,and then it uses a new outlier factor to perform secondary detection on the candidate set.CPOFS integrates DBSCAN and neighbor algorithm framework for detection.Theoretical analysis and experimental results show that CPOFS has good detection performance.
Keywords/Search Tags:Data mining, Outlier detection, Double parameters outlier factor, Candidate set
PDF Full Text Request
Related items