Uniform Designs And Applications In Big Data Subsampling

Posted on:2022-05-27

Degree:Doctor

Type:Dissertation

Country:China

Candidate:M Zhang

Full Text:PDF

GTID:1520306551486814

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

A scientific experiment is not only an effective way to understand the nature,but also a necessary link for the development of high technology.As an important branch of statistics,the experimental design has been widely used in the product quality monitoring,drug research and development,aerospace,scientific computing,artificial intelligence and other important scientific and technological fields.The explosive growth of data in the field of science and technology and real life has brought severe challenges to the storage of data and the operation ability of computer.The experimenters often know little about the relationships between factors and objective functions(responses).How to effectively design experiments is a very challenging topic.Applying uniform designs to big data sampling is a very significant research.The measure of uniformity plays an important role in the theory of uniform design.For both cases of quantitative factors and qualitative factors,there are many reasonable and effective measurement criteria.Experimental designs with qualitative and quantitative factors have received growing attention in recent years,but few studies have been done on the measure criteria of uniformity of such designs.There is an established theoretical system for the regular experimental domains,such as multidimensional rectangle,sphere,simplex.However,the construction of uniform design is a difficult problem for the experimental domain with arbitrary shape(such as the map with extremely irregular shape).Therefore,the model-robust subsampling method is of great significance.1.Uniformity criterion for designs with both qualitative and quantitative factors.In practical applications,we often need to design a system that contains both qualitative factors and qualitative factors.Many construction methods for this kind of designs,such as marginally coupled designs,were proposed to pursue some good space-filling structures.Uniformity is an important property of space filling.In this thesis,a new uniformity criterion,qualitative-quantitative discrepancy(QQD),is proposed to measure the uniformity for the designs with both qualitative and quantitative factors.Then,the uniformity of the design with these two types of factors is analyzed,and the formula of the deviation is derived.Beside marginal coupled designs and corresponding surface designs,QQD can measure the uniformity of any design including qualitative and quantitative factors.The lower bound of QQD can be used to judge whether the design is uniform or not.2.Construction of uniform designs on arbitrary domains by inverse Rosenblatt transformation.There exist established theories and methods for constructing uniform designs on hypercube domains,while construction of the uniform designs on arbitrary domains remains a challenging problem.In this thesis,we propose a deterministic construction method through inverse Rosenblatt transformation,which is a general approach to convert the uniformly designed points from the unit hypercubes to arbitrary domains.To evaluate the constructed designs on irregular domains,we employ the central composite discrepancy as a uniformity measure.The proposed method is demonstrated on a class of flexible regions,constrained and manifold domains,and the geographical domain with very irregular boundary.The new construction results are shown competitive to traditional stochastic representation and acceptance-rejection methods.3.Model-free subsampling based on uniform designs.Subsampling or subdata selection is a useful approach in large-scale statistical learning.Most existing studies focus on model-based subsampling methods which significantly depend on the model assumption.In this thesis,we consider the model-free subsampling strategy for generating subdata from the original full data.In order to measure the goodness of representation of a subdata with respect to the original data,we propose a criterion,generalized empirical F-discrepancy(GEFD),and study its theoretical properties in connection with the classical generalized L2-discrepancy in the theory of uniform designs.These properties allow us to develop a kind of low-GEFD data-driven subsampling method based on the existing uniform designs.By simulation examples and a real case study,we show that the proposed subsampling method enjoys the model-free property and is superior to the random sampling method.In practice,such a model-free property is more appealing than the model-based subsampling methods,where the latter may have poor performance when the model is misspecified,as demonstrated in our simulation studies.In addition,the subsampling methods proposed in this thesis have the capacity of parallel computation for the cases of decentralized data storage according to the use of sliced uniform designs.

Keywords/Search Tags:

Space-filling property, Uniform design, Qualitative-quantitative discrepancy, Lower bounds, Inverse Rosenblatt transform, Generalized empirical F-discrepancy, Generalized L2-discrepancy, Koksma-Hlawka inequality, Model-free subsampling

PDF Full Text Request

Related items

1	The Study Of Uniform Property Of Several Classes Of Complex Designs
2	New Lower Bounds For Lee Discrepancy And Wrap-around L₂-discrepancy On Two And Three Mixed Levels Factorials
3	New Lower Bounds For The Symmetric L₂-discrepancy And Their Application
4	Projection Weighted Symmetric Discrepancy
5	Symmetric Discrepancy And Its Improvement
6	Uniform Design With Prior Information Of Factors Under Weighted Wrap-around L₂-discrepancy
7	Level Permutation Method For Constructing Uniform Designs Under The Wrap-around L₂-discrepancy
8	Level Permutation Method For Constructing Mixed Level Uniform Designs Under The Wrap-around L₂-discrepancy
9	Uniformity In Terms Of The Symmetric L₂-discrepancy Of Double Designs
10	Constructions Of Mixed-level Uniform Design With Weighted Discrete Discrepancy