Font Size: a A A

Research On Fuzzy Clustering Validity

Posted on:2011-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:M H TangFull Text:PDF
GTID:2178360305960697Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of database technology and the wide application of database management system, large data is accumulated. Data plays an important role as the main carriers of information in the information society. It is hoped that the computer helps us to extract useful information from the mass of data, and make decisions based on the rich data. Data mining is such a technology and is widely studied.As one of the most important part of data mining, according to the principle---"Like attracts like", clustering analysis divides objects into several clusters, so that the objects within the same cluster have large similarity in comparison, but large difference in separation in different clusters. Traditional clustering analysis focuses on "hard" partition, which one object only belong to one cluster. However, many objects are fuzzy in their boundaries, it's better to group them softly. The theory of fuzzy sets is applied into the clustering and becomes the strong theoretical basis for "soft" partition, and then the Fuzzy Clustering Analysis is produced. Clustering is unsupervised classification, and needs some parameters, such as the clustering number c, and fuzzy weight exponent m. It is called cluster validity analysis to evaluate whether the clustering result is reasonable. The goal of the cluster validity analysis is also to find the optimal number of clusters sometimes for fuzzy clustering.Classic validity index Vxie hides two shortcomings. Some validity functions based on the validity index Vxie are proposed by several researchers. However, they can not well measure the quality of clustering. This thesis studies their advantages and disadvantages, carries on their thoughts, and analyzes that the fuzzy weight exponent m will affect the FCM algorithm and the ability of validity function. Therefore, considering the changes of fuzzy weight exponent m and the number of clusters c, two punishing functions are introduced to overcome the two shortcomings of validity index Vxie, so we proposed an improved clustering validity function Vnew. According to unary function limit and multivariate function limit of mathematical Analysis, the validity of the index Vnew is proved in theory. And its time complexity is low according to analyzing the function expression.Wu et al firstly combined compact, overlap and separation, and proposed a fuzzy validity index VCSO and presented the definition of overlap. In this thesis, we think that there is a little subjectivity during the definition of overlap in index VCSO, the stability of the index VCSO and the accuracy of clustering validity will be affected largely. Then, we study the geometric meaning of any element in membership matrix U, and define the overlap once again. An example is introduced to explain that the new definition of overlap is feasible. In addition, taking fuzzy weight exponent m into account and combining with the thought of validity index VCSO, an improved fuzzy clustering validity index VCSO-new is proposed.
Keywords/Search Tags:clustering analysis, fuzzy clustering, fuzzy clustering validity, fuzzy weight exponent, overlap
PDF Full Text Request
Related items