Research On Subspace Clustering Based On Attribute Reduction

Posted on:2019-05-04

Degree:Master

Type:Thesis

Country:China

Candidate:H Li

Full Text:PDF

GTID:2348330566966107

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology,massive data has been produced in various applications.It has become an urgent requirement for people to acquire useful knowledge from a large amount of data.Data mining has thus become an important research field at present.Cluster analysis is one of the important research directions of data mining.Since cluster analysis belongs to the unsupervised learning scheme,its application in real life is very extensive,e.g.,biological analysis,web log analysis,and etc.However,with the ever-increasing dimensionality of data,the "dimensionality curse" problem of high-dimensional data becomes more and more obvious.When dealing with the clustering problem of high-dimensional data,the results of clustering are often unsatisfactory due to the curse of dimensionality.This is caused by the two major characteristics of high-dimensional data,that is,(1)the data in high-dimensional data sets is usually sparse;(2)because the distances of various data in high-dimensional data sets are similar to each other,the traditional distance-based clustering algorithms become meaningless.The above two characteristics increase the difficulty of clustering.Therefore,how to effectively cluster high-dimensional data has become one of the main research topics in recent years.To solve the above problems,the subspace clustering method has been proposed.As an effective method to deal with high-dimensional data clustering,subspace clustering first projects the data from high-dimensional space into low-dimensional subspace by using some feature selection strategy,and then clusters data in low-dimensional subspace.However,there are still many problems for the existing subspace clustering methods.For instance,during the process of dimension reductionof high-dimensional data,the feature selection method adopted can not effectively preserve the classification ability of the original data,which may lead to the bias of clustering results in the subspace.In addition,most of the existing subspace clustering methods can only deal with numerical high-dimensional data.They can not effectively deal with categorical high-dimensional data.In order to effectively solve the problems of existing subspace clustering methods,in this thesis we apply rough set theory to subspace clustering.First,we propose a rough set attribute reduction algorithm based on granularity decision entropy,called ARGDE.We use ARGDE algorithm to reduce the dimensionality of high-dimensional data.Second,we propose a K-modes clustering algorithm(called WODKM)based on the weighted overlapping distance.We use WODKM algorithm to cluster data in the low-dimensional subspace,which can effectively deal with categorical high dimensional data.Third,we combine ARGDE algorithm and WODKM algorithm together,and propose a subspace ensemble clustering algorithm(called SPECCH)for categorical high-dimensional data.Finally,experiments are performed on multiple UCI data sets.The experimental results show that the proposed subspace clustering algorithm can solve the clustering problem of categorical high-dimensional data.The research work of this thesis mainly includes the following three aspects:(1)An attribute reduction algorithm based on granularity decision entropy is proposed.To solve the problems of existing attribute reduction algorithms based on information entropy,we propose a new information entropy model � granular decision entropy,and design a new attribute reduction algorithm ARGDE based on granularity decision entropy.We perform experiments on multiple UCI data sets.Compared with the traditional algorithms,the proposed algorithm can obtain smaller reducts and higher classification accuracy.(2)To deal with categorical data,a new distance metric,called weighted overlap distance,is proposed,and a K-modes clustering algorithm(called WODKM)based on weighted overlapping distance is proposed.In WODKM algorithm,we use the concepts of attribute significance and rough entropy in rough set theory to calculate the significance of each attribute.When calculating the weighted overlap distance between any two objects,different attributes will be given various weights,according to their significances,which can effectively reflect the differences of different attributes.(3)A subspace ensemble clustering algorithm for categorical high-dimensional data,called SPECCH,is proposed.In SPECCH algorithm,we first use ARGDE algorithm to construct multiple feature subspaces.Second,WODKM algorithm is used to cluster data in the constructed feature subspaces,and multiple clustering results are generated.Third,we ensemble the multiple clustering results via weighted voting.We perform experiments on several UCI data sets.Compared with the traditional algorithms,the proposed algorithm can obtain better experimental results.

Keywords/Search Tags:

rough set, granularity decision entropy, attribute reduction, subspace clustering, high-dimensional data, category data

PDF Full Text Request

Related items

1	Research And Application Of Data Reduction Algorithms Based On Rough Entropy
2	Research And Application Of High Efficient Attribute Reduction For High Dimensional Data Based On Rough Sets
3	The Research Of Clustering Based On Rough Set Theory
4	Study On Attribute Reduction Criteria And Information Loss Of Attribute Reduction Based On Rough Sets
5	Research On Attribute Reduction Algorithm Based On Decision Tree And Information Entropy
6	Research On Approaches Of Dynamic Attribute Reduction Based On Knowledge Granularity
7	ESCHCD: Entropy-based Algorithm For Subspace Clustering With High Dimensional Categorical Datasets
8	A High Dimensional Data Stream Clustering Algorithm Of Quick Dimension Reduction
9	Study And Application Of Attribute Reduction Algorithms Based On Rough Sets
10	Rough Set Theory In The Decision Tree