Font Size: a A A

A Bayesian Finite Multivariate Mixture Erlang Model With Feature Selection

Posted on:2022-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:M J ZhangFull Text:PDF
GTID:2530306323470314Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Finite mixture model plays an important role in clustering and classification,it can be applied on data sets with mixed types of variables.The application based on mixture model is also very extensive,but there are also some problems.First,it typically relies on the EM algorithm which could be very sensitive to the choice of initial values when used in parameter estimation.Second,the traditional clustering methods such as K-means clustering and Hierarchical clustering cannot obtain the importance of clustering features,which can be an important information in clustering.To address these challenges,we proposed an infinite mixture model to simultaneously conduct variable selection,we put a spike and slab prior on each variable to obtain variable importance.Spike and slab prior is originally proposed for variable selection in linear regression,but it can be naturally applied in mixture models for unsupervised clustering.We took a Bayesian approach to obtain parameter estimates and the cluster membership to bypass the limitation of the EM algorithm.We can get the posterior distribution and the feature importance degree by MCMC(Markov chain Monte Carlo method)sampling.The mixed Erlang model is proved to be dense in the nonnegative continuous multivariate space.As a special case of the mixed gamma model,it is suitable for the nonnegative positive multivariate data,such as actuarial,medical experiment,industrial production and other fields.In this paper,CMM-MCMC algorithm is applied to the finite mixed Erlang model with common rate parameter.Firstly,we use the CMM(Clustered Method of Moments)algorithm which can be a K-means clustering combined with moment estimate to determine the initial values of mixed Erlang model parameters.Secondly,MCMC was used to obtain Bayesian posterior inference for parameters by setting reasonable prior distribution.Finally,the CMM-MCMC is combined with handing label switching issues to estimate the shape parameter.For the feasibility and effect of the model algorithm,the empirical analysis is carried out through the simulation experimental data and the example data respectively,and the parameter estimation results are tested.
Keywords/Search Tags:Mixed Erlang, Bayesian Parameter Estimation, Clustering, Feature Selection, MCMC Algorithm
PDF Full Text Request
Related items