In recent years,aggregation query has become an important means of big data analytics,and it plays an important role in many fields such as e-commerce,financial analytics,web search and medical service.However,with the increase of the data volume,under the invironment of massive data,users' requirements of real-time and reliability have posed a great challenge to the aggregation query.In this paper,we focurs on the sampling of aggregation query under the massive data environment,and obtains the following research results:(1)As the accurate aggregation query under big data environment need to traverse all the data,approximate aggregation has become the most popular method of aggregation query,also known as error-bounded approximate aggregation query,the method is achieved by sampling.However,the sampling techniques that applied to aggregation query perform poorly under the big data environment,especially in high-dimensional data.In this paper,we applu stratified sampling method to aggregation.In particular to reduce the size of sample with error-bound for KMeans algorithm of two-dimensional data.It can reduce half of the sample size compare to the previous method.(2)The sparse data is a kind of data which is relatively common at present,that is,the range of data is large compare to the population of data.Based on the existing sparse data sampling method,this paper proposes a heuristic algorithm sampling method based on queue,And further optimization of the stratified scheme,the time complexity for the two methods is the same,but the sample size can be reduced by 20% and 30% compare to the existing technique. |