Font Size: a A A

Research On High Dimensional Data Clustering Based On Improved Evolutionary Algorithm

Posted on:2019-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:L N HuoFull Text:PDF
GTID:2428330623469013Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Now that we have entered the era of big data,the data we have acquired has become more and more complex,not only with respect to many aspects,but also with increasing data dimensions.For example,various types of transaction data,gene expression data,WEB usage data,and the like,their dimensions can reach hundreds or even thousands.Cluster analysis is an effective method for data mining.Due to dimensional disaster and data sparsity,high-dimensional data clustering brings huge challenges to current clustering algorithms.In general,the cluster structure in high-dimensional space often exists in the subspace rather than the entire space.In the existing research methods of subspace clustering,soft subspace clustering is an important topic.Focusing on the existing soft subspace clustering algorithms to optimize an objective function,it is easy to fall into a local optimum during the clustering process,and depends on the initial clustering center and other issues.Firstly,the multi-objective evolutionary algorithm is improved,and then A multi-objective optimization model for high-dimensional data clustering was established.With the improved evolutionary algorithm as the optimization framework,a multi-objective evolutionary algorithm-based soft subspace clustering algorithm was proposed to improve the stability and clustering results of clustering results.At the same time overcome the defects of the number of clusters input in advance.The innovations and main work of the thesis are:(1)An improved evolutionary algorithm GLEA is proposed.In order to improve the global optimization ability of multi-objective evolutionary algorithm and the effect of large-scale decision variables on the optimization effect,this paper is based on the multi-objective evolutionary algorithm LMEA framework,and is mainly improved in two aspects.First,the variable decomposition process is optimized through random sampling and non-dominated sorting.Second,in the optimization process,levy mutation strategy is adopted to generate progeny,which improves the global optimization ability of the algorithm.Compared with the current more advanced multi-objective evolutionary algorithm,this algorithm can better maintain the diversity and convergence of the solution.(2)A multi-objective soft subspace clustering algorithm based on GLEA is proposed.Establish three objective functions related to intra-class distance,inter-class distance,and standard mutual information(NMI),and use the improved evolutionary algorithm GLEA as an optimization framework to fuse with soft subspace clustering algorithms to solve the clustering problem of high-dimensional data.Experiments were performed on artificial datasets,UCI datasets,and gene expression datasets.Rand index(RI),Adjusted rand index(ARI),and Normalized mutual information were used as evaluation indicators.Compared with other algorithms,it is proved that the algorithm can obtain better clustering effect on high-dimensional data and does not need to determine the number of clusters in advance.
Keywords/Search Tags:High-dimensional Data, Soft Subspace Clustering, Multi-objective Evolutionary Algorithm, Variable Decomposition, Levy Mutation Strategy
PDF Full Text Request
Related items