Research And Application Of New Methods In Symbolic Clustering

Posted on:2009-12-19

Degree:Master

Type:Thesis

Country:China

Candidate:X X Xie

Full Text:PDF

GTID:2178360272957426

Subject:Computer application technology

Abstract/Summary:

Clustering technology is very important. Based on one metric (similarity metric, dissimilarity metric or distance), so called clustering is to divide set of individuals into some subset so that it is more similar between individuals in the same subset than in different subsets according to the certain criteria, the purpose of which is to mine the information from dataset. At present, the common methods of clustering mainly include Hierarchical Clustering, Dividing Clustering, Model-based Clustering, Density-based Clustering, Grid-based Clustering and so on. The technology of clustering has been widely applied to taxonomy, bioinformatics, business, medicine, image process and so on.The object processed by the traditional technology of clustering is continuous numerical data (It is called traditional data including fuzzy data). However, now we find that some information in many cases can't be presented appropriately by traditional data, for example, color (You might use different numbers to present different colors, however, the numbers is not values in traditional data but codes of colors), customers'feedback, temperature range of an area for a period time and so on. The data is not so orderly, single-valued, and continuous as traditional data, sometimes there are relations between different features of the same individual, we call this kind of data symbolic data. With the emerging of more and more symbolic data, the special field to analyze and process symbolic dataâ€”symbolic data analysis (SDA, Symbolic Data Analysis) is established, among which clustering for symbolic data is an irreplaceable branch. The purpose of the clustering analysis for symbolic data is to apply the technologies in traditional clustering to clustering for symbolic and to create new theory and methods of clustering which are consistent with the characteristics of symbolic data in the need of the time. According to the above the purpose, some research and improvement of clustering for three common kinds of symbolic dataâ€”nominal data, interval data and mixed data (There exist traditional features as well as symbolic features ) are made in the dissertation based on the former works.For nominal data, Hamming distance is used usually, but it is too rough to mine fully the information hidden in data. The Particle Swarm Optimization (PSO) in Intelligent Optimization Algorithm is used to get the distance which is suitable for the certain dataset through training. The experiments of Hierarchical Clustering show that the distance which is gotten through PSO training outperform Hamming Distance for nominal data.For interval data, in the dissertation, the concept of Mutual Distance is adopted and one metric of Mutual Distance suitable for interval data is proposed, and based the metric, one new method of clusteringâ€”Affinity Propagation Clustering (APC) is introduced to avoid the trouble of presentation of centers in clustering for symbolic data. Finally, the experiments show that the proposed algorithm is better the C-means (CM) based Euclidean Distance.For mixed data, feature weight was not considered in the former clustering for mixed data, so the feature weight is considered in the dissertation to get the Fuzzy C-means with feature weight for mixed data. The experiments indicate that considering feature weight is rational and necessary.

Keywords/Search Tags:

Clustering, symbolic data, Symbolic Data Analysis, Clustering for Symbolic, nominal data, Hierarchical Clustering, Particle Swarm Optimization, interval data, Mutual Distance, Affinity Propagation Clustering, mixed data, feature weight

Related items

1	Clustering Of Generally Distributed Interval Symbolic Data Using Self-organizing Map (SOM) Algorithm
2	The Dynamic Analysis Of Generally Distributed Histogram-Valued Symbolic Data And Interval Symbolic Data
3	Research On Clustering Algorithm Of Mixed Data
4	Research Of Fuzzy Clustering Algorithm For Incomplete Data Based On Interval Analysis
5	Research On Nominal Data Clustering/Classification Algorithms With Their Applications In Anomaly Detection
6	Research On Feature Selection Methods For Symbolic Interval Data And Their Application
7	Research And Application Of Complex Data Stream Clustering Algorithm
8	The Research Of Text Data Streams Clustering Algorithm Based On Affinity Propagation
9	Research Of New Fuzzy Clustering Algorithms Based On Objective Function And Its Applications
10	Research On Heterogeneous Data Clustering Algorithm