Font Size: a A A

A Study Of Fast Clustering Methods Based On Co-evolution And Spectral Clustering For Large Data

Posted on:2015-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2308330464466893Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the development of internet and the use of sensors, data is becoming much easier to be obtained. The amount of data human acquired shows a geometric growth that brings a new era- the era of big data. Thus, the data mining technique is more important in this era. Clustering is a basic mean of data mining. But it has some disadvantages such as low accuracy and efficiency. In order to solve these problems, three fast clustering methods are proposed in this article as following:1. An Immune Clustering algorithm based on Co-Evolution(ICCE) is proposed. First, the clonal selection method is used to achieve the competition within population to select the individuals with high fitness values to reconstruct each population. The internal evolution of each population is completed during this process. Second, co-evolution operation is conducted to realize the information exchange among populations and this operation accords with the process of biological evolution. Finally, the evolutionary results are compared with the global best individual results, with a strategy called elitist preservation, to find out the individual who has the highest fitness value, that is, the result of clustering. The algorithms mentioned are tested in eight UCI datasets and eight artificial datasets and are analyzed. And the result shows the effect of the proposed algorithm is much better than the compared one.2. A Fast Clonal Selection Clustering algorithm based on Sparse Affinity Propagation Sampling is proposed. First, the neighbor data points of every data point are searched out to construct a sparse similarity matrix. Second, using this matrix as an input for the Affinity Propagation clustering algorithm to find out the representatives can represent the other data points. Third, these representatives are used as the input for the clonal selection clustering method to cluster them. At last, according to the result of Affinity Propagation clustering algorithm, the other data points are assigned to the cluster of their representatives. The algorithm is tested in three UCI datasets and five artificial datasets. The experiments show that the real run time of the proposed method is shorter than the other algorithm in all datasets. What is more, the mean accuracies of the proposed algorithm are higher than the compared algorithm.3. A Fast Clustering Algorithm based on Sparse Affinity Propagate is proposed. First, the dense similarity matrix is simplified through constructing a sparse graph. That makes the similarity matrix as a sparse matrix. Second, the Affinity Propagate algorithm is used to select the representatives. As the input is a sparse matrix, the time complexity of Affinity Propagate algorithm is reduced. Finally, with the representatives, LSC method is used to find out the cluster of each data point. The experiment shows that no matter the effect or the time complexity, the proposed algorithm is much better than the other algorithms. In addition, the run time of the proposed algorithm is linear to the number of the data point.
Keywords/Search Tags:Large Dataset, Clustering, Affinity Propagate Algorithm, Spectral Clustering, Co-Evolution
PDF Full Text Request
Related items