Font Size: a A A

Clustering Of Generally Distributed Interval Symbolic Data Using Self-organizing Map (SOM) Algorithm

Posted on:2015-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:M L WangFull Text:PDF
GTID:2348330485493777Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and the accelerated process of informatization in every field of society, the amount of data is showing explosive growth nowadays, which brings new challenges to traditional data mining and analysis technologies. Basing on the thought of "data packaging", Symbolic data analysis techniques(SDA) provides a set of methods and theories for knowledge discovery of massive data. Symbol interval data is the most common form of data for SDA, which also has the most extensive range of applications. Cluster analysis, as a useful tool to analyse complex data relationships under non-priori knowledge, is an important data mining technology research branch, and has been widely applied in the field of symbolic data analysis. Stemming from neural networks, Self-Organizing Map(SOM) method has its unique advantages in clustering analysis because of its topological order preserving and visualization features. The existed clustering methods of interval data supposed that the data are uniformly distributed across the interval. However, this is not always true in practice. Taking this into account, this dissertation aims to research SOM clustering method of interval data with the assumption of general distribution.Firstly, the basic concept of symbolic data and the descriptive statistics of symbol interval data are introduced. Based on the assumption the general distribution, the definition of the general distributed interval symbolic data and its difference with uniformly distributed interval symbolic data is illustrated.Based on the assumption of the intervals being generally distributed, this dissertation puts forward a new representation of such intervals. Then the traditional city-block distance measure is applied to the new representation of interval data. Based on this, the algorithm of SOM clustering of generally distributed interval symbolic data is presented. A simulation experiment is conducted to evaluate the validity of our method. The results show that, compared with the SOM clustering algorithm based on uniform distribution assumption and traditional Hausdorff distance, the SOM clustering algorithm based on general distribution assumption and new representation of intervals proposed in this dissertation is more effective under the conditions designed in our experiment. Finally, the method is illustrated by an example of real-case meteorological data which shows the advantages of our method in the practical application.
Keywords/Search Tags:Interval symbolic data, General distribution, Symbolic data analysis, Clustering analysis, Self-organizing Map
PDF Full Text Request
Related items