Font Size: a A A

Cluster Analysis Based On The Mixture Gaussian Models

Posted on:2015-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ZhangFull Text:PDF
GTID:2308330452456949Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Data clustering is a static data analysis technique, it has been widely used in machineLearning, data mining, pattern recognition, image analysis and bioinformatics. Due to therandomness and complexity of the data’s statistical distribution, but the probabilitydistribution of the data can be used to approximate the mixed Gaussian models arbitrarily,this paper studies cluster analysis based on the Gaussian mixture models.In this paper, we study two types of Gaussian mixture models. The first category isfinite Gaussian mixture models which provide a probabilistic method to study theclustering, we typically use the EM algorithm to estimate the parameter values of finiteGaussian mixture models, it does not require a priori knowledge and can automaticallylearn the model structure for the parameters, but its deficiency is that it is sensitive to theinitial cluster centers. The paper uses three different methods to verify the EM algorithmwhich subjects to the impact of initial value. In order to overcome the defect of the EMalgorithm.this paper study the correction of the EM algorithm with penalty likelihoodfunction. Intuitively, if some mixture weights or mixture probability converges to zero,the corresponding component will be screening off and appropriate components will beretained. The advantage is that when we deal with multi-dimensional Gaussian mixturemodels, we do not require prior assumption that different components have the samecovariance matrix. Compared with the traditional EM algorithm, the experimental resultsshow that the clustering effect of this method is better.The another category is infinite Gaussian mixture models, because of the finiteGaussian mixture models need to anticipate the clustering numbers for high dimensionaldata, thus the accuracy and generalization of clustering are affected. This paper studiescluster analysis based on the infinite Gaussian models which core is Dirichlet process.it isseen as the prior of weights for high dimensional data.the advantage is that the numbers of clustering in the model can be automatically calculated and it does not need determineindependently, which can accurately fit the data with strong flexibility and robustness.
Keywords/Search Tags:The mixture Gaussian Models, EM algorithm, Modified EM algorithm, Initialization methods, Dirichlet Process
PDF Full Text Request
Related items