Font Size: a A A

Research And Implementation On Sampling Of Approximate Aggregation Query Under The Big Data Environment

Posted on:2017-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2428330569998821Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,aggregation query has become an important means of big data analytics,and it plays an important role in many fields such as e-commerce,financial analytics,web search and medical service.However,with the increase of the data volume,under the invironment of massive data,users' requirements of real-time and reliability have posed a great challenge to the aggregation query.In this paper,we focurs on the sampling of aggregation query under the massive data environment,and obtains the following research results:(1)As the accurate aggregation query under big data environment need to traverse all the data,approximate aggregation has become the most popular method of aggregation query,also known as error-bounded approximate aggregation query,the method is achieved by sampling.However,the sampling techniques that applied to aggregation query perform poorly under the big data environment,especially in high-dimensional data.In this paper,we applu stratified sampling method to aggregation.In particular to reduce the size of sample with error-bound for KMeans algorithm of two-dimensional data.It can reduce half of the sample size compare to the previous method.(2)The sparse data is a kind of data which is relatively common at present,that is,the range of data is large compare to the population of data.Based on the existing sparse data sampling method,this paper proposes a heuristic algorithm sampling method based on queue,And further optimization of the stratified scheme,the time complexity for the two methods is the same,but the sample size can be reduced by 20% and 30% compare to the existing technique.
Keywords/Search Tags:massive data, error-bounded, aggregation query, stratified sampling, sparse data
PDF Full Text Request
Related items