Font Size: a A A

The Research Of Application And Optimization Of Gaussian Mixture Model In Data Clustering

Posted on:2016-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2308330452968986Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In model-based clustering, a mixture of distributions is employed to fit the dataset andeach sample assigned to a certain distribution. Gaussian mixture model (GMM) is widelyapplied in practice. Expectation Maximization (EM) algorithm is usually used to estimate theparameters of GMM, but there could be some drawbacks that affect the result of theestimation. In this paper a comprehensive method is proposed, where the standard EM isreplaced with the Component-Wise EM algorithm and Minimum Description Length (MDL)criterion is taken to perform model selection. Considering the fact that naturally data mayreside on or close to an underlying submanifold, we add the local consistency regularizationto the log-likelihood to improve the estimation. The experiments on simulated data and realdata show that our approach can work well.Gaussian mixture model with the standard EM algorithm is a widely applicable approachfor science computing and research. The truncation or censoring phenomenon is mostfrequently appearing in data collection work of science research. So this motivated interest insolving the truncated or censored data problem. This paper studied a truncated and censoredEM algorithm for fitting multivariate Gaussian mixture models to data that is truncated andcensored simultaneously. A split and merge operation was proposed to improve its sensitive toinitialization. We verified this method on some synthetic and real data sets and got some goodresult.It is important to use full Bayesian inference in parametric learning. Although newmethodology, like Reversible Jump MCMC algorithm for fully Bayesian mixture analysis hasbeen developed, however applying the RJ-MCMC algorithm to full Bayesian inference in themultivariate Gaussian mixtures is still intractable. This paper tries to adapt the RJ-MCMCalgorithm to multivariate Gaussian mixtures with some adjustment in the processing ofchanging the number of components. Then construct a hierarchical model for fitting mixtures.The effectiveness of our proposed method has been verified by the synthetic and real dataexperiment.
Keywords/Search Tags:data clustering, Gaussian mixture model, EM, maximum likelihood, modelselect, truncated data, censored data, Bayesian inferences, R J MCMC, hierarchical model
PDF Full Text Request
Related items