Font Size: a A A

Research On Algorithms For Subspace Clustering And Outlier Mining Based-on Information-entropy

Posted on:2015-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:C Z FangFull Text:PDF
GTID:2298330452454695Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Outlier analysis is one of important tasks of data mining, which aims at mining thepotential and abnormal data schemas or behaviors. It has an important significance in bigdata application field. Subspace clustering and Subspace outlier Detection are the hotissues for analyzing big data and high dimensional data. In this paper, the research statusof subspace clustering and subspace outlier detection is analyzed. According to theefficiency and scalability of subspace clustering are bad already in place, and theperformance of subspace outlier detection is poor, these problems are analyzed andresearched.in this paper.Firstly, the advantages and disadvantages of several typical algorithms of subspaceclustering and subspace outlier detection are analyzed intensively. There exists to selectthe best clustering subspace unstably and complexly for CMI methods. Therefore,algorithm for subspace clustering based on Cumulative Holoentropy is proposed toimprove the weakness of CMI. This algorithm takes the Cumulative Holoentropy asmetric to select the best clustering subspace.Secondly, it has a comprehensive analysis about several subspace outlier detectionalgorithms. Similarly, the improvement for the poor performance of CMI method in thephase of outlier detection making use of LOF algorithm has been done, and the algorithmfor subspace outlier detection based on information-entropy increment is proposed. Thenew proposed method takes the information-entropy increment of data set when oneobject is removed to measure outlier of the object in subspaces.Finally, the validity and scalability of algorithm for subspace clustering based onCumulative Holoentropy and algorithm for subspace outlier detection based oninformation-entropy increment are tested on both real and synthetic data sets bycomparing to the CMI method.
Keywords/Search Tags:data mining, big data analysis, subspace clustering, subspace outlier detectionCumulative Holoentropy, information-entropy increment
PDF Full Text Request
Related items