Font Size: a A A

Testing The Homogeneity Of Mixture Models Under Small Or Moderate Sample Sizes

Posted on:2022-09-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:P C RenFull Text:PDF
GTID:1487306722471494Subject:Statistics
Abstract/Summary:PDF Full Text Request
Large amounts of heterogeneous data exist in biology,medicine,economics and other fields,but the heterogeneity is always unobserved.Finite mixture models are some of the most effective tools for describing this type of data.However,before modeling data with a mixture model,it is necessary to check whether the data come from a homogeneous population;if so,then the mixture structure could overfit the data.Therefore,testing the homogeneity of mixture models is an important first step in fitting them.In the literature,testing the homogeneity of mixture models has a long history,but these studies were focused mainly on the large-sample properties of the testing methods.However,the limiting distributions of the test statistics approximate the corresponding real distributions poorly under small or moderate sample sizes,which leads to inaccurate statistical inferences.This thesis is focused on testing the homogeneity of different mixture models under small or moderately sized samples and is organized as follows.In Chapter 2,we study the homogeneity test of a three-sample mixture model,which plays an important role in analyzing data from two different distributions as well as from a mixture of them.Most previous work involving this model has been focused on estimating the mixing proportion,and the methods used were based on the fact that the model is heterogeneous.However,testing the homogeneity of the mixture structure is a necessarily pilot study for the estimation work but has received little attention to date.Assuming that the two densities come from the same location-scale family,we begin by considering the likelihood ratio test(LRT)and investigate its limiting distributions under the null and local alternative hypotheses.Then,to solve the problem that the LRT cannot control type I errors well under small or moderately sized samples,we consider the generalized fiducial inference and propose four types of generalized p-values based on the Gibbs algorithm.A simulation study shows that the generalized fiducial methods perform better at controlling type I errors and are more powerful than the LRT when the heterogeneity occurs in locations and scales simultaneously.Furthermore,according to the real-data example,the Gibbs algorithm is more efficient than the existing Metropolis-Hastings algorithm,and the five methods can successfully detect the heterogeneity of the data.In Chapter 3,we focus on quantitative trait locus(QTL)interval mapping in genetics.We consider only backcross designs without double recombination between two-marker-QTL intervals.In the backcross population,individuals without gene recombination follow two different distributions,while those with gene recombination follow the mixture of these two distributions.We assume that the two distributions are from the same location-scale family in two cases:(?)the component distributions differ in locations and scales;(?)the component distributions have identical but unknown scale.For the two cases,we construct four generalized p-values with the Gibbs algorithm.A simulation study shows that compared with the LRT,the four generalized p-values perform better at controlling type I errors while retaining comparable power in cases with small or moderate sample sizes.The four generalized fiducial methods support varied scenarios:two of them are more aggressive and powerful,whereas the other two appear more conservative and robust.At the end of this chapter,a real-data example involving mouse blood pressure is used to illustrate the generalized fiducial methods.In Chapter 4,we focus mainly on testing the homogeneity of finite locationscale mixture models with a structural parameter.The most common locationscale mixture distributions include the normal mixture distribution,the t mixture distribution,the logistic mixture distribution,and the extreme-value mixture distribution.However,testing the homogeneity of the latter three nonnormal mixture models has received little attention to date.In this chapter,we consider the expectation-maximization-test(EM-test)method to test the homogeneity of the latter three location-scale mixture models with a structural parameter.The limiting distribution of the EM-test statistic is 0.5?02+0.5x12.However,this distribution approximates poorly the real distribution of the EMtest statistic for a small or moderately sized sample.To obtain accurate type I errors and reasonable powers,we consider using the computer experiments to obtain the Bartlett-adjusted limiting distributions and adaptive tuning parameters.A simulation study shows that this goal is achieved by using the adjusted limiting distributions and adaptive tuning parameters.In addition,the EMtest is faster than the modified LRT.A real-data example involving data on the lifetime of electronic appliances is used to illustrate the proposed EM-test.
Keywords/Search Tags:Mixture model, Generalized fiducial inference, EM-test, Likelihood ratio test, Location-scale family
PDF Full Text Request
Related items