Font Size: a A A

Study Of Parameterless Outlier Detection And Complex-manifold Clustering Algorithm

Posted on:2018-11-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:J L HuangFull Text:PDF
GTID:1318330536469511Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining is the procedure that search understandable pattern,rule and law from big data.Outlier detection and clustering analysis are two very important research interests of data mining.Although the concepts and technology of outlier detection and clustering analysis are mature,outlier detection and clustering face new challenge with increasing number,dimension and type of data that generated by real life and network ect..That make some unresolved complex problems of outlier detection and clustering analysis are beginning to surface.This research hope furtherly develop the concept and technologe of outlier detection and clustering analysis,and solve some problem that existed in outlier detection and clustering analysis,via research basi theory and algorithm.This paper introduce the concept of natural neighbor into outlier detection,and propose an outlier detection algorithm without parameter k based on natural neighbor.Outlier detection is a very important technology that eliminate potential threats and find new mechanism in data mining.In practical application,outlier detection have been applied to many fields,such as fraud detection.Therefore,this paper analyze the present situation of outlier detection,and solve the problem of parameter k selection.The distance-based and density-based outlier detection algorithms are commonly-used algorithms.However,same as most others outlier detection algorithms,distance-based and density-based outlier detection algorithms need manually set the value of parameter k.Distance-based outlier detection algorithms need parameter k to computing the value of k-distance of each point in dataset.Density-based outlier detection algorithms need parameter k to computing the value of density of each point in dataset.If the value of parameter k is inappropriate that may lead to getting bad outlier detecting result.We prove the effectiveness of the proposed algorithm by many experiments.Natural value that obtained by searching algorithm of natural neighbor not only is applicable to the proposed algorithm,but also si applicable to LOF and INS etc.outlier detection algorithms.In order to more effectively detecte outlier cluster of dataset,this paper propose a outleir cluster detecting algorithm without parameter Top-n.Outlier detection include outlier point detection and outlier cluster detection.The detecting result of outlier cluster detection contains the cluter structure etc.correlative information.Thus,it is convenient to furtherly study the outlier detection result.Therefore,relative to outleir point detection,outlier cluster detection has more practicablity and development prospects.Most existing outlier detection algorithms are based on clustering algorithm.However,so far there is no clustering algorithm that is devoted to outlier cluster detecting.Morever,These existing outlier cluster detecting algorithm face some problems,such as need too many parameter,parameter is hard to set etc..First,based on the mutual neighborhood graph,we propose a rough clustering algorithm that is devoted to detect outleir detecting,and roughly cluster the dataset by the proposed clsutering algorithm.Then,based on the rogh clustering result,we compute the relative oultier cluster factor of each cluster,and construct the outlier cluster decision graph.Finally,we find the outleir cluster through the outleir cluster decision graph.We prove the effectiveness of the proposed outlier cluster detecting algorithm by experimetn on many artficial datasets and real datasets.We also confirm that the outleir rate obtained by proposed algorithm is very close to real outlier rate of dataset.This paper propose a novel clustering algorithm that is applicable to complex manifold data based on Quasi-Cluster Centers.Clustering analysis is one of primary methods of data mining.Clustering analysis can find the distribution rule of data and analyse the data.This paper analyse the present situation of clustering analysis,and indicate some problem that existing clustering algorithm faced,such as sensitive to parameter,hard to cluster the complex manifold data.Meanwhile,the density measurement of existing density-based and center-based clustering algorithm face a problem that sparse cluster may be regarded as outlier,if density variation between clusters is great.First,we introduce the density measurement of outlier detection into clustering analysis,and introduce a new concept of quasi-cluster center: the density of cluster center is the maximum among its k nearest neighbors or reverse k nearest neighbors.Second,we cluster the dataset and get the initial clusters by spreading to sparse area form quasi-cluster center.Finally,we difine the similarity between clusters,and obtain the final clustering result via merge these clusters that similarity is big.The experiment on many artificial dataset and real dataset prove that the proposed algorithm is effective.Moreover,the proposed algorithm is robust with respect to the parameter that it need.In theory,the proposed clustering algorithm is applicable to any manifold dataset.
Keywords/Search Tags:natural neighbor, outlier, outlier cluster, clustering analysise, compex manifold
PDF Full Text Request
Related items