Font Size: a A A

Research And Application Of New Methods In Symbolic Clustering

Posted on:2009-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:X X XieFull Text:PDF
GTID:2178360272957426Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering technology is very important. Based on one metric (similarity metric, dissimilarity metric or distance), so called clustering is to divide set of individuals into some subset so that it is more similar between individuals in the same subset than in different subsets according to the certain criteria, the purpose of which is to mine the information from dataset. At present, the common methods of clustering mainly include Hierarchical Clustering, Dividing Clustering, Model-based Clustering, Density-based Clustering, Grid-based Clustering and so on. The technology of clustering has been widely applied to taxonomy, bioinformatics, business, medicine, image process and so on.The object processed by the traditional technology of clustering is continuous numerical data (It is called traditional data including fuzzy data). However, now we find that some information in many cases can't be presented appropriately by traditional data, for example, color (You might use different numbers to present different colors, however, the numbers is not values in traditional data but codes of colors), customers'feedback, temperature range of an area for a period time and so on. The data is not so orderly, single-valued, and continuous as traditional data, sometimes there are relations between different features of the same individual, we call this kind of data symbolic data. With the emerging of more and more symbolic data, the special field to analyze and process symbolic data—symbolic data analysis (SDA, Symbolic Data Analysis) is established, among which clustering for symbolic data is an irreplaceable branch. The purpose of the clustering analysis for symbolic data is to apply the technologies in traditional clustering to clustering for symbolic and to create new theory and methods of clustering which are consistent with the characteristics of symbolic data in the need of the time. According to the above the purpose, some research and improvement of clustering for three common kinds of symbolic data—nominal data, interval data and mixed data (There exist traditional features as well as symbolic features ) are made in the dissertation based on the former works.For nominal data, Hamming distance is used usually, but it is too rough to mine fully the information hidden in data. The Particle Swarm Optimization (PSO) in Intelligent Optimization Algorithm is used to get the distance which is suitable for the certain dataset through training. The experiments of Hierarchical Clustering show that the distance which is gotten through PSO training outperform Hamming Distance for nominal data.For interval data, in the dissertation, the concept of Mutual Distance is adopted and one metric of Mutual Distance suitable for interval data is proposed, and based the metric, one new method of clustering—Affinity Propagation Clustering (APC) is introduced to avoid the trouble of presentation of centers in clustering for symbolic data. Finally, the experiments show that the proposed algorithm is better the C-means (CM) based Euclidean Distance.For mixed data, feature weight was not considered in the former clustering for mixed data, so the feature weight is considered in the dissertation to get the Fuzzy C-means with feature weight for mixed data. The experiments indicate that considering feature weight is rational and necessary.
Keywords/Search Tags:Clustering, symbolic data, Symbolic Data Analysis, Clustering for Symbolic, nominal data, Hierarchical Clustering, Particle Swarm Optimization, interval data, Mutual Distance, Affinity Propagation Clustering, mixed data, feature weight
PDF Full Text Request
Related items