Font Size: a A A

Number Of Clusters In Cluster Analysis To Determine The Problem

Posted on:2002-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:L YangFull Text:PDF
GTID:2208360062996421Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering, which is used to differ and classify the things according to their comparability, is a process in which there is no one to guide them. So it is a non-supervising classifying. And clustering analysis uses mathematics' method to study and deal with the classification of the given data and the degree of benignity and scattering between every two clusters. It's a tool to analyze data under no hypothesis. In artificial intelligence and pattern recognition, it's also called "non-advanced-test study", and it's an important tache in knowledge acquisition of machine study. Things are put together in clusters, and man lives in gang. Clustering is an ancient problem, which deepens increasingly going with the arising and developing of human society. Those who wants to know the world must differentiate the different things and understand the similarity among them. Cluster has been widely applied in various engineer and science areas, sucn as psychics biology medicine communication and long-distance telepathy.Various clustering methods have been brought forward according to the need of various areas. Among them, the object clustering method is the most popular one. But they classify the data into clusters according to their attributes, then optimize the cluster centers and subjections which are under the assumption of the given cluster number. This is some similar to the system identification, which assumes the system structure has been given before we estimate the parameters. When they come to how to confirm the cluster number, they either give no answer, or enumerate it. So the current question is whether we can confirm the cluster number directly, not using any assumptions. This is the problem the paper will discuss.As for repeated optimization method, the most important things are the chose of right clustering rules and the similarity measurement between clusters. But, in this aspect, many content work has been done, so the paper will choose a right clustering rule based on the work of them, in order to confirm cluster number directly under no assumptions. And the repeated optimization methodis a kind of climbing, so it's easy to convergent to local extermum. How to solve it is the problem that the paper will discuss.This paper takes the following rule function into account:which includes the classical WGSS. And it proves this optimizing problem has only one apex when the independent variable is cluster number, which can be realized by golden mean method, and its lining realizing method can use K-means algorithm and genetic algorithm. The reason why we integrate them is that K-means algorithm is a mountain climbing method, which is easy convergent to local extremum, and sensitive to the original condition, but its convergent speed is relatively fast, and that genetic algorithm is a random searching method, which can find the whole extremum in a rather big probability, and non-sensitive to the original condition, but its convergent speed relatively slow.The emulators indicate that the use of the rule function and the realizing method can get better effect.
Keywords/Search Tags:clustering analysis, K-means algorithm, genetic algorithm, multi-objects optimization method, golden mean method
PDF Full Text Request
Related items