Font Size: a A A

The Dynamic Analysis Of Generally Distributed Histogram-Valued Symbolic Data And Interval Symbolic Data

Posted on:2013-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2268330392970519Subject:Information management and information systems
Abstract/Summary:PDF Full Text Request
The rapid development of Internet technology brings enormous data, howeverthere are limitations about traditional clustering analysis method to face with suchhuge sample space. Symbolic data analysis (SDA), which was announced in the1980s,is a method gleaning useful knowledge and excavating samples’ regularity from largedatasets. Clustering is an explanatory procedure which helps to understand data withcomplex structure and multivariate relationships, which is widely used in SDA. Theexisted clustering methods of symbolic data mostly supposed that the data areuniformly distributed across the interval. However, this is not always practical. Takingthis into account, this paper aims to research dynamic clustering method ofhistogram-valued data and interval data with a general distribution.The definitions of two kinds of commonly used symbolic data are proposedfirstly, which are histogram-valued data and interval data. A number of studies of thehistogram-valued data about clustering analysis had made, including the formation ofit, the distance between them with a general distribution, and the dynamic clusteringmethod of it. Taking Iris dataset for an example, it was grouped to be as symbolicobjects, and dynamic clustering analysis was made on them to illustrate the method ispractical.On the basis of Hausdorff distance, the paper puts forward a new distance forinterval data, which considers the point data contained in the intervals. Based on this,we present the algorithm of dynamic clustering of p-dimensional interval symbolicdata with generally distribution. A simulation experiment is conducted to evaluate thevalidity of our method. The results show that, compared with analysis methods ofuniform interval symbolic data, the analysis methods of generally distributed intervalsymbolic data are more effective under all the conditions designed in our experiment.Finally, the method is illustrated by an example of real-case data which shows theadvantages of our method in the practical application.The focus of this work is to provide the dynamic algorithm to establish clustersfor p-dimensional histogram-valued data and interval data, and cluster validityindexes were used for judging. The results showed that compared with traditionalmethods of uniform symbolic data, the dynamic clustering analysis methods ofgenerally distributed symbolic data were more effective and more objective, no matter for histogram-valued data or interval data.
Keywords/Search Tags:Histogram-valued symbolic data, Interval symbolic data, Generaldistribution, Symbolic data analysis, Clustering analysis
PDF Full Text Request
Related items