Font Size: a A A

Research On Calculation Of Statistical Depth Function Under Big Data

Posted on:2019-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y F JiangFull Text:PDF
GTID:2417330545481014Subject:Statistics
Abstract/Summary:PDF Full Text Request
In the past ten years,the technologies of data acquisition,storage and analysis developed rapidly,which greatly reduced the cost of data storage and processing.An era of big data has gradually emerged.After its emergence,big data has infiltrated all fields of social economy,and has had a profound impact on social management,economic development and personal life.In traditional data analysis,the analysis of one-dimensional data is now very mature.However,with the rapid growth of information technology,huger volume and more complex structure of data,there is no statistic similar to one-dimensional data sorting.In 1975,Tukey[47]proposed the concept of statistical depth function for the first time,which effectively solved this problem to a certain extent.The statistical depth function can be used to sort and analyze high-dimensional data,and to diagnose abnormal values as well.In the selection of position parameters,the depth function defines the depth of each data point,and takes the deepest point as the position parameter.In this way,the influence of outliers is avoided,so that the robustness of estimation can be achieved.By avoiding the influence of outliers,the statistical depth functions can find all the information in the data,and it is proved to be very effective.This paper introduces several commonly used depth functions,and summarizes the existing calculation algorithms of them.It is worth noticing that relevant calculation algorithms mentioned here are all accurate algorithms,so they can only be applied when the data volume is small.As for calculations in the context of big data,this paper illustrates the calculation of projection depth function and simple depth function in Chapter 4.Similar to other robust estimations,the parameter estimation based on depth functions is very complicated with big data volume and high dimension.To make the robust estimations more practical,it is very important that calculation problems can be solved.Since the era of big data started,the data volume is larger,the dimension is higher and the structure is more complex.Faced with numerous complicated data,the existing statistical depth functions bring computational pressure to researchers because of its complex calculation process.In order to simplify the complex calculation process,this paper improves the existing computational algorithms for statistical depth function.As an improvement,Chen and Ouyang?2001?[3]proposed two faster algorithms for three-dimensional and four-dimensional data,respectively.Unfortunately,the algorithms are infeasible for higher-dimensional data,because their extensions are more complex than the naive algorithms when p35.More importantly,their ideas have no corresponding codes.At present,there is no detailed research on faster algorithms than na?ve types when the dimension p?2.The algorithms for projection depth function when p?2 proposed by Liu and Zuo?2014?[27]is computationally intensive.As a trade-off,this paper only selects a small number of random direction vectors in the calculation based on some predetermined plans,which is at the cost of computational accuracy,but the benefit for time savings is still considerable.In order to facilitate the application to the simplex depth,the calculation is further explained and problems are elicited by introducing two algorithms.The first one is exact.Various empirical examples show that the proposed algorithms run much faster than the na?ve one.However,when n or and p are arbitrary,the problem of accurately calculating simplex depth is essentially a problem of NP-problem,although there is a polynomial-time algorithm for fixing p.Therefore,reasonable approximates can be used.As a consequence,an approximate algorithm based on resampling is proposed,and the procedures are described in detail in the fourth chapter.In this paper,a new approximate algorithm for p?3 is proposed.It is affine-invariant and is more computationally effective than other corresponding algorithms.
Keywords/Search Tags:Big Data, Depth Function, Projection Depth, Simple Depth, Outlier Detection
PDF Full Text Request
Related items