Font Size: a A A

Research On Outlier Detection Based On Possibilistic Fuzzy Clustering

Posted on:2018-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:X WanFull Text:PDF
GTID:2348330569980242Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Outlier is a small amount of data which is significantly different from the mainstream data.Outlier data often contains development trends of the things and therefore is not simply equivalent to noise.There have been a large number of outlier detection algorithms,of which some are based on clustering and intuitive.In the clustering based algorithms,data objects that are significantly deviated from each cluster are difined as outliers,and clusters that have much less data objects than the other clusters are defined as outlier clusters.Because of the mature clustering algorithms,the clustering based outlier detection algorithms could provide a solid theoretical basis for research on outlier detection.This paper elaborates the advantages and disadvantages of various outlier detection algorithms firstly.The outlier detection algorithm based on clustering is more intuitive and efficient.The research of this kind of algorithm focuses on the research of clustering algorithm.Then,we introduce the fuzzy clustering algorithm and its improved algorithms from the aspects of clustering idea,clustering objective function and algorithm flow.In this paper,we focus on the characteristics of fuzzy joint clustering,probability clustering and fuzzy three-dimensional clustering.Some improvement measures are put forward for existing problems.The main work is as follows:Firstly,this paper proposes a hybrid clustering algorithm to solve the noise sensitivity issue of fuzzy co-clustering and the consistent issue of problistic clustering.Based on fuzzy co-clustering,the sample typical membership degree is added into the objective function so that the algorithm becomes possibilistic fuzzy co-clustering.At the same time,the mutual information loss is used to evaluate the similarity between the samples.The algorithm can effectively identify outlier,reduce the effect of outlier on clustering accuracy,and have a low sensitivity to parameters.Secondly,the fuzzy co-clustering belongs to two-dimensional clustering algorithm.In order to be able to effectively detect outlier in a three-dimensional matrix,this paper proposes a fuzzy three-dimensional clustering algorithm based on information bottleneck.The algorithm extends the idea of fuzzy co-clustering dealing with two-dimensional matrix to three dimensions,and uses the information bottleneck distance as the distance formula.This algorithm can group data from three dimensions simultaneously,and emphasize the importance of similarity measure in clustering process.Compared with the existing co-clustering algorithms,this algorithm can obtain higher clustering accuracy,and provide a theoretical basis for outlier detection in three-dimensional data.
Keywords/Search Tags:outlier detection, fuzzy co-clustering, possibility clustering, information bottleneck, fuzzy three-dimensional clustering
PDF Full Text Request
Related items