Font Size: a A A

An Automatic Data Clustering Method Based On The Evolutionary Computation

Posted on:2021-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:J X ChenFull Text:PDF
GTID:2428330611466933Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of science and technology,enormous data are generating and accumulating in the real life,which encourage people to mine the information and value behind them.Through the clustering analysis,the data distribution and the underlying structure help people to better understand and solve the problems in reality.In many practical applications,it is crucial to perform automatic data clustering with unknown cluster number.The evolutionary computation paradigm is good at dealing with this task,but the existing algorithms encounter several deficiencies.In this paper,we propose a novel elastic differential evolution algorithm E-DE to solve the automatic data clustering problem.Unlike traditional methods,the proposed algorithm considers each clustering layout as a whole to evolve.We adopt a variable length encoding scheme,which encodes the cluster centroids that take effect during the clustering process.The encoding has no redundancy that it enhances the search efficiency.To enable the individuals of different lengths to exchange information properly,we develop a two-phase mutation operator and a subspace crossover.The mutation first determines the cluster number by the differential information in different individuals.Then,a Gaussian disturb is taken to fine tune the cluster centroids.In the crossover,a selected chromosome segmentation constructs the subspace for exchanging the information between the target and mutant vectors in order to generate a new trial vector.The operators employ the basic method of differential evolution and,in addition,they consider the spatial information of cluster layouts to generate offspring solutions.Particularly,each dimension of the parameter vector interacts with its correlated dimensions,which not only adapts the cluster number but also avoids the cross-dimension learning error.The experimental results show that our algorithm outperforms the state-of-the-art algorithms on most real and synthesis datasets.It is able to identify the correct number of clusters and obtain a good cluster validation value.Through the sensitivity analysis,we validate the parameter settings,the design of genetic operators and the handle of empty cluster.The results prove the effectiveness and robustness of our proposed algorithm.
Keywords/Search Tags:clustering, evolutionary computation, variable length encoding scheme, subspace
PDF Full Text Request
Related items