| The exponential family distribution is an important type of statistical distribution family in statistics.However,in the real world,there are still many data that cannot be fitted by the exponential family distribution model.In order to meet people’s needs for complex data analysis,statisticians A type of distribution that is wider than the exponential family distribution is proposed,which is called the regenerative divergence model.Among the collected data points,some data points are not obvious in statistical inference.Removing a few data points does not affect the diagnosis result.Some data points may have a greater effect on statistical inference than other data points.It has an impact on the inferred results.Some of the data points’ characteristic characteristics also deviate significantly from other points in the data point set.We usually call them abnormal points or strong impact points.Because of the existence of abnormal points,we need to diagnose and correct abnormal points,so the diagnosis of abnormal points is very important.This paper focuses on the following two contents:Firstly,the Pena distance is used to study the statistical diagnosis problem under the reproductive dispersion model,the expression of the Pena distance under the regenerative divergence model is obtained,and its properties are discussed,so as to obtain the method for discriminating abnormal points with high leverage.In addition,comparing the Pena distance with the Cook distance,it is concluded that the Pena distance is better than the Cook distance under certain conditions.the model and method are is illustrated by simulation studies and a real example analysis.Secondly,for heterogeneous population data,the mixture of regression models is an important tool in statistical data analysis tools.To the mixture data of reproductive dispersion,a mixture of generalized linear reproductive dispersion model is proposed.The EM algorithm is used to estimate the maximum likelihood of the model parameters.The Pena distance and Cook distance are used to study the statistical diagnosis problem.At the same time,the Pena distance and Cook distance are compared.Finally,the data of mixture population and mixture subclustering are compared through imulation studies and a real example analysis.,which further shows that the theory and method are reasonable and effective. |