Font Size: a A A

A Study Of Evolutionary Algorithms Using For High-dimensional Data Clustering

Posted on:2015-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:D J YangFull Text:PDF
GTID:2308330464468657Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Clustering analysis is of great value in data mining, and used widely, such as web document analysis, news automatic classification, information filtering, bioinformatics and etc.. With the development of information science, the data we get, becomes more complex and has a higher dimension, such as, business data, gene express datasets and document data and the data may have several thousand, even ten thousand dimension. Because the data has great potential value, the high-dimensional data clustering analysis is very important.Compared with the data in low dimensional space, the high dimensional data is sparse in nature. The distribution of high-dimensional data in the whole space is sparse, and that makes the clustering for high-dimensional data hard. But, in general, the clusters exist in the subspace which consists of some relevant dimension with high-dimensional data. So if we find out the relevant dimension, we can achieve a good performance of high-dimensional data clustering. But it’s too difficult to find out these relevant dimension. By the analysis above, we can conclude that the difficulty of high-dimensional data clustering is how to find the relevant dimension.Based on the analyzing of the advantages and disadvantages of the existing soft subspace clustering algorithms, a differential evolution is proposed in soft subspace clustering algorithm to improve the clustering result and the multiple objective clustering is applied into the soft subspace clustering algorithm to overcome the defect that the number of category should be set manually for clustering algorithm. The main achievements of this paper are as follows:1. The traditional soft subspace clustering algorithms are all based on the model of k-means. So the traditional soft-subspace clustering algorithms are sensitive to the initial clustering centers and the dimensional weights are not very accurate. To overcome the drawbacks of the traditional soft subspace, the soft subspace clustering using differential evolution algorithm is proposed. This algorithm is based on the model of soft subspace clustering algorithm, and use differential evolution to optimize the dimensional weights. This algorithm can achieve a more stable and better result when applied into synthetic datasets, UCI datasets and cancer expressdatasets.2. A new multi-objective evolutionary soft subspace clustering method is proposed to overcome the drawbacks of traditional soft subspace clustering algorithms, which only has one objective function and need set the number of category manually. First, the intra-cluster compactness and the inter-cluster separation are treated as the objective functions. Then the NSGA-II algorithm is used to optimize this multi-objective problem, and a set of Pareto solutions can be achieved. At last, a kind of semisupervised is used to select a preferred solution as the optimal solution. The experiments on synthetic datasets, UCI datasets and cancer expression datasets, show that the algorithm has a similar performance with the soft subspace clustering using differential evolution algorithm, but it doesn’t need to set the number of category.3. Through analysis of the document clustering, we see that document clustering is high-dimensional and has a centralized distribution in the subspace. The soft subspace clustering algorithm is suitable to solve this problem. In this paper, we apply the soft subspace clustering using differential evolution algorithm into document clustering and get a good result.This research is supported by the by the Program for New Century Excellent Talents in University(No. NCET-12-0920), the Program for New Scientific and Technological Star of Shaanxi Province(No. 2014KJXX-45), the National Natural Science Foundation of China(Nos. 61272279, 61001202 and 61203303), the Fundamental Research Funds for the Central Universities(Nos. K5051302049, K5051302023, K5051302002 and K5051302028) and the Fund for Foreign Scholars in University Research and Teaching Programs(the 111 Project)(No.B07048).
Keywords/Search Tags:High-dimensional Data, Evolutionary Algorithm, Multi-Objective Optimization, Document Clustering, Differential Evolution
PDF Full Text Request
Related items