Font Size: a A A

Research On Clustering Based On Biogeography-Based Optimization

Posted on:2016-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:X Q WenFull Text:PDF
GTID:2308330473959921Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining aims to discover valuable knowledge hidden in massive data. As an im portant task of data mining, cluster analysis targets at grouping samples into classes or clu sters by similarity. In the process of clustering, similarity within one cluster will become as high as possible, and yet similarity between clusters as low as possible. It achieves disc overing the internal characteristic of the data objects and data distribution and acquiring d eeper understanding of the data by comparing the similarity or difference of data attribute s or structure. NP(Non-deterministic Polynomial) property of clustering problem have in spired a large number of researchers to use meta-heuristic optimization techniques for dat a clustering. There are a series of evolutionary clustering algorithm springing up, such as genetic algorithm, ant colony algorithm and so on. As a swarm intelligence optimization method Biogeography-Based Optimization has attracted much attention from data cluster ing domain for its advantages, such as easy operation, fast convergence rate and excellent global searching performance and so on.In this paper, we investigate into BBO algorithm and find that random initialization of population individuals may degrade its convergence. At the same time BBO algorithm use the roulette migration strategy would decrease habitat diversity decrease, increase the risk of algorithm trapped in local optimal solution. The correlation analysis for these two difficult problems, the following three aspects of the improvement strategies was propos ed:First, in view of the algorithm using the roulette migration strategy, so the algorithm integrates a new migration operator, which is constructed on gradient descent local search.and uses clustering objective function value as the individual fitness to optimize implicit cluster structures in datasets. Experimental results on the four benchmark datasets (Iris, Wine, Glass and Diabetes) show that algorithm outperforms in terms of clustering validity and convergence, and can acquire the higher quality cluster structures of the datasets.Secondly,in view of BBO algorithm initialize of population individuals by random, we proposed that use SOM (Self-Organizing Map) algorithm to initialize BBO population, execute improved clustering BBO algorithm. Traditional SOM network connection weights was initialized by random strategy, which would seriously affect the cluster structure optimization performance.SOM network connection weights strategy is optimized by sample mean method. The result of SOM is used to initialize BBO population individuals. Experimental results on the four benchmark data sets (Iris, Wine, Glass and Diabetes) show that algorithm outperforms in terms of clustering validity and convergence, test the performance of the algorithm.Thirdly, we propose to use mixed BBO in text data. The first step is to delete the stopping words in test data sets, then use Singular value decomposition technology to reduce dimensions. At last, the clustering result is given after searching individual optimization process (include coding the individuals, computing the fitness, migration and mutation), test the practicability of the algorithm.
Keywords/Search Tags:data mining, Clustering analysis, Biogeography-Based Optimization algorithm
PDF Full Text Request
Related items