Font Size: a A A

Global Convergence Of A Novel Hybrid Clustering Algorithm

Posted on:2017-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y GongFull Text:PDF
GTID:2348330488458868Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Cluster analysis aims to group data sets, so that data objects in the same group have the greatest similarity, but have the greatest dissimilarity in different groups. In many areas, data objects in the same group can be treated as one. In the era of big data, it's very important to use the increasing amount of data appropriately. Cluster analysis is significant in data mining area, which is used in many fields. In business, it can help to find customers with the similar buying patterns, then dig up their characteristics, finally end with increasing profit. In biology, it is widely used to classify genes and proteins for researching their inner structures. In this way, there are some significant improvements in biomedical, life science, etc. In the financial sector, isolated point is discovered thanks to cluster analysis, which can help to identify fraud and financial crime. Nowadays, cluster analysis becomes a very active research topic, and many scholars has made remarkable achievements in this respect.There is no training process of clustering other than classification. Before clustering, it's not clear for the number of classes. There are some common algorithms like systemic cluster approach and ant colony clustering algorithm based on ant heap. However, during the actual operation, the number of classes can be estimated by assumption or priori knowledge. K-Means is the most classical clustering method, but it's easy to fall into local optimum and sensitive to the initial points. XK-Means improves K-Means by adding stochastic exploratory factor to cluster center, but unfortunately, empty clusters may be produced, which makes the process break off. For this purpose, an improved method is proposed in this paper called IXK-Means with filling empty classes. But it's not enough and IXK-Means still may fall into local optimum. Considering that genetic mechanism can help, so we mix IXK-Means to genetic algorithm instead of crossover called GEXK algorithm, which has faster global convergence rate compared with another global algorithm called genetic K-Means. However it still needs the number of classes before clustering. To overcome this weakness, we propose to use ant colony algorithm to determine the number of classes before GIXK algorithm. GIXK algorithm not only has fast convergence rate, but also can be used for any data sets. Through the experiment and theory proving, GIXK algorithm can obtain better clusters compared with other relative algorithms and converge to the global optimum solution.
Keywords/Search Tags:Clustering algorithm, Random disturbance, Filling empty classes, Global convergence
PDF Full Text Request
Related items