Font Size: a A A

Variable Selection For Gaussian Mixture Model-Based Clustering And Its Application

Posted on:2017-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y W ChenFull Text:PDF
GTID:2348330503461381Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
In the high-dimensional clustering analysis, traditional methods can not be the effective clustering application due to the increase of the data dimension. Thus, the primary problem of high-dimensional clustering is to find appropriate methods to reduce the dimension of data. This paper combined the dimension reduction of variable selection and the gaussian mixture model-based clustering to implement the type of penalty clustering analysis and its application. Penalty GMM can find the important information of variables for the high-dimensional data. There-fore, we first proposed the L? penalty model of GMM to select the important information for clustering by compressing the maximum average parameters, and the modified bayesian information criterion MBIC select the penalty parameters ? and the cluster number K. Secondly, we put forward the Adaptive L?-penalty model of GMM that do a lighter shrinkage for the unimportant variables and do the heavier shrinkage for the important variables by adjusting the penalty param-eters, which can make up for the L?-GMM excessive punishment of important information variables. Finally,the Adaptive L?-GMM applied in the biological information data,the results show that we get effectively clustering results and mice protein gene expression levels of important information variables when the GMM clustering the high-dimensional data analysis with the penalty term.
Keywords/Search Tags:Variable Selection, L_?-GMM, Adaptive L_?-GMM, EM Algorithm, High-dimensional Clustering Analysis
PDF Full Text Request
Related items