Font Size: a A A

Validation Research On Principal Component Analysis And Cluster Analysis Of Interval Symbolic Data

Posted on:2011-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:D DengFull Text:PDF
GTID:2178330338481602Subject:Information management and information systems
Abstract/Summary:PDF Full Text Request
Symbolic data analysis(SDA) is a method analyzing and gleaning useful information and knowledge from massive data. With the information explosion becoming routine, SDA is so widely used that it's concerned by more and more experts and scholars. As interval valued data is a kind of commonly used symbolic data, its multivariate statistical analysis is always a research focus in SDA area, especially principal component analysis and cluster analysis, and many new methods is proposed. The paper does some research on the validation of existing interval symoblic data analysis methods.Present symbolic data analysis methods of interval valued data usually suppose that interval variables satisfy uniform distribution, but non-uniform distribution (e.g., normal distribution) is more common in practice. The paper researches the validation of symbolic data analysis motheds of generally distributed interval data and analysis methods of uniform distributed interval data. The paper first summarizes and briefly overviews the existing principal component analysis and cluster analysis methods in symbolic framework. Based on the principle of PCA, validity index is defined, and simulation study is carried out in order to compare Vertices-PCA, Centers-PCA and PCA of generally distributed interval symbolic data. For cluster analysis of interval symbolic data, an external and three internal validity index are proposed. To compare the clustering methods of uniform distributed interval symbolic data and methods of generally distributed interval symbolic data, simulation experiments are carried out. And then real application is considered.The results show that compared with analysis methods of uniform interval symbolic data, the analysis methods of generally distributed interval symbolic data are more effective and lead to more objective results in both principle component analysis and cluster analysis.
Keywords/Search Tags:Interval-valued Symbolic Data, Principal Component Analysis, Cluster Analysis, Validation, General Distribution
PDF Full Text Request
Related items