| With the rapid development of science and technology,the scale of data grows at the speed of geometric series,and large capacity has become the primary feature of big data.The subsampling method is one of the most effective methods to reduce computing costs and improve the estimation efficiency of population parameter,through a small number of important samples.It is also an important application of statistical inference theory in the era of big data.Among subsampling methods,sampling with replacement is the most commonly used sampling method in big data subsampling.At the same time,it is simple to operate,short in time,and has the excellent property that each sampling is independent and identical distribution with each other.Almost all the estimation methods based on sampling with replacement in the literature are inverse probability weighted(IPW)methods.Therefore,the current optimal subsampling with replacement scheme is obtained through the asymptotic property of IPW estimator.However,the IPW estimation has the shortcomings of unstable estimator and inability to add auxiliary information.In order to overcome these two shortcomings of IPW estimator,this paper applies the empirical likelihood weighting(ELW)method into the parameter estimation problem of subsampling with replacement,which fundamentally solves the unstable problem of estimators under subsampling with replacement and improves the estimation efficiency.Starting from the asymptotic properties of ELW method,this paper uses a series of approximation techniques to minimize the upper bound of mean square error under the L-criterion.The equivalent problem of minimizing the upper bound is also introduced to design a simpler algorithm.A good balance is achieved between the accuracy and simplicity of the calculation,and an approximate optimal subsampling scheme is obtained.Then,on the basis of ELW estimation method,combined with the characteristics of sampling with replacement,the empirical likelihood weighted estimation method with supplementary information(ELWA method)is proposed under the Zestimation framework,which further improves the estimation efficiency.We also consider the special circumstances that the ELWA estimation method has no non-degenerate solution,and gives a processing scheme.The consistency and asymptotic normality of ELWA estimator are proved in this paper.Corresponding to the ELW method,we also design a approximate optimal subsampling scheme of the new ELWA estimation method by approximately minimizing the upper bound of the mean square error under the L-criterion.We give the solution steps in practical applications for all sampling and estimation methods.Finally,an example of bike sharing data shows that the ELWA estimation method and the two sampling methods proposed in this paper have good results in practice.In conclusion,this paper further improves the subsampling method of big data,and also promotes the development of parameter estimation theory based on sampling with replacement. |