Font Size: a A A

Cluster analysis of high dimensional data and dimension reduction for regression

Posted on:2006-03-22Degree:Ph.DType:Dissertation
University:The Pennsylvania State UniversityCandidate:Bai, Steven GangFull Text:PDF
GTID:1458390005996393Subject:Statistics
Abstract/Summary:
The purpose of this dissertation is twofold. First, we introduce and develop a new procedure to identify clusters in high dimensional data by projecting them onto a low-dimensional subspace along which the clusters are best separated. The criterion for projection is the determinant of certain moment matrices that take small values when the data is clustered. We also developed a statistical procedure to determine the number of clusters, as well as a scheme to fine tune the assignment of clusters after an initial clustering is completed.; Second, we propose a new approach to dimension reduction in regression analysis, which is based on the properties of the Conditional Simultaneous Expectations of the predictors. This method is more comprehensive in estimating the dimension reduction space than the classical methods, because it does not rely on a monotone trend and on a U-shaped trend for its accuracy. The simultaneous conditional expectation is a novel and promising concept that tackles a major difficulty in dimension reduction. We believe that it will play an important role in the future research on dimension reduction.
Keywords/Search Tags:Dimension reduction, Data, Clusters
Related items