Font Size: a A A

Study On Granularity Clustering

Posted on:2014-05-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:H ZhuFull Text:PDF
GTID:1268330392465072Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering analysis is one of the important ways of knowledge discovering inpattern recognition and artificial intelligence. Traditional clustering is a kind of harddivision. As the time for big data is coming, high-dimensional, incomplete, complex,vague, massive data are produced. These plentiful data and their high dimensionalcharacter make the traditional data analysis method be outshone. Granular computingis an important tool of uncertain information processing, and it is also new method tosimulate human thinking and solve complex problems in the field of computationalintelligence. The rise of granular computing develops the field of clustering into softcomputing which further promotes its value for practical uses and makes thetheoretical significance of clustering more close to reality. Clustering analysis can beperformed from different levels and different angles through the change of granularity,thus "either the one or the other" clustering has its research foundation and practicemethod. This might make up for the shortage of the traditional clustering and ishelpful for the solution of the problem.This paper focuses on granularity clustering through combining granularcomputing and clustering analysis together. The thought of granularity runs throughthe procedure of data preprocessing and clustering analysis. And at the same time,clustering is a main method of attribute granulation and sample granulation. Thepaper describes different levels and different angles of granulation through objectfunction and the values of parameter of clustering. This paper mainly includes thefollowing aspects:To reduce time and space complexities of attribute reduction in clusteringproceeding, we granulate attributes through clustering method in parallel. Attributegranulation based on attribute discernibility and AP clustering method calculates thesimilarity of attributes according attribute discernibility first, and then clustersattributes into several group through affinity propagation clustering algorithm. At last,representative attributes are produced through some algorithms to form a coarserattribute granularity. The method is a more efficient algorithm than traditionalattribute reduction algorithm for large data set. It has obvious advantages under thecondition of less strict precision of attribute granularity. A parallel attribute reductionalgorithm based on affinity propagation clustering improves the efficient of attributereduction under maintained the same classification ability. But it is limited when the data set is large scale because traditional attribute reduction algorithm is selected inparallel reduction.We can apply granular computing model to clustering method in order tocombine them together. But the clustering results are unable to translate freely.Because all clustering algorithms are uniformed by granular thought, this paperpresents a new twice clustering method based on the variable granularity andclustering network(VGTC). VGTC combines two clustering algorithms togetherthrough granularity computing in order to have better performance than any singlemethod. The aim of the first clustering is not to complete the task of clustering forthe whole data set, but to find an appropriate clustering layer. On this basis, secondaryclustering completes clustering operation for domain. Variable granularity twiceclustering based on K-means algorithm and hierarchical clustering (an example ofVGTC) can cluster the non-spherical shape data sets correctly, and avoid someproblem of K-means(such as it is influenced by initial clustering center) andhierarchical clustering(such as the lower efficiency). Furthermore, the algorithm canimprove the accuracy and efficiency of clustering. Another twice clustering ofvariable granulation based on AP and hierarchical clustering selects AP algorithm asthe first clustering method, so the granulation of clustering is finer, the result is stableand has high accuracy. The time of searching appropriate granulation is shorter thanK-means.AP algorithm is not appropriate for subspace clustering. In order to solve thisproblem, two improved AP algorithms are put forward. An entropy weighting APalgorithm for subspace clustering based on asynchronous granulation of Attributesand Samples removes the redundant attributes first, and then a step of modifyingattribute weight is added to the clustering procedure in order to obtain the exactweight value. At the end of clustering, an accurate result of attribute granularity willbe produced. Another method is AP subspace clustering algorithm based on attributesrelation matrix. It is asynchronous soft subspace clustering algorithm. This algorithmfilters out redundant attributes by computing the gini coefficient. To evaluate thecorrelateion of each two non-redundant attributes, the relation matrix is constructedbased on two dimensional united gini coefficients. The candidate of all interestingsubspaces is achieved by looking for the maximum sub-matrixes which contain only1.Finally, all subspace clusters can be gotten by AP clustering on interesting subspaces.The method obtains interesting subspaces correctly and reduces time and space complexity at the same time. It keeps the advantages of AP clustering and overcomethe shortage of it.Research on granularity of parallel program is done in this paper. Under theguidance of fine-grain parallelism, an AP clustering algorithm based on improvedattribute reduction and fine-grain parallelism is proposed. Firstly, granularity thoughtis introduced into parallel computing and granularity principle is applied as well.Secondly, data set is preprocessed by the method of improved attribute reductionalgorithm through which elements in discernibility matrix will be calculated andselected in parallel, in order to reduce the complexity of time and space. Finally, dataset is clustered by the means of parallel AP algorithm. The whole task can be dividedinto multiple threads to be processed simultaneously.
Keywords/Search Tags:Clustering Analysis, Granular Computing, Attribute reduction, Subspaceclustering, Parallel Computing
PDF Full Text Request
Related items