Research And Implementation On Sampling Of Approximate Aggregation Query Under The Big Data Environment

Posted on:2017-01-14

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhang

Full Text:PDF

GTID:2428330569998821

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,aggregation query has become an important means of big data analytics,and it plays an important role in many fields such as e-commerce,financial analytics,web search and medical service.However,with the increase of the data volume,under the invironment of massive data,users' requirements of real-time and reliability have posed a great challenge to the aggregation query.In this paper,we focurs on the sampling of aggregation query under the massive data environment,and obtains the following research results:(1)As the accurate aggregation query under big data environment need to traverse all the data,approximate aggregation has become the most popular method of aggregation query,also known as error-bounded approximate aggregation query,the method is achieved by sampling.However,the sampling techniques that applied to aggregation query perform poorly under the big data environment,especially in high-dimensional data.In this paper,we applu stratified sampling method to aggregation.In particular to reduce the size of sample with error-bound for KMeans algorithm of two-dimensional data.It can reduce half of the sample size compare to the previous method.(2)The sparse data is a kind of data which is relatively common at present,that is,the range of data is large compare to the population of data.Based on the existing sparse data sampling method,this paper proposes a heuristic algorithm sampling method based on queue,And further optimization of the stratified scheme,the time complexity for the two methods is the same,but the sample size can be reduced by 20% and 30% compare to the existing technique.

Keywords/Search Tags:

massive data, error-bounded, aggregation query, stratified sampling, sparse data

PDF Full Text Request

Related items

1	Research And Implementation Of Sampling-Based Aggregation Query System On Big Data
2	Research On Sampling Based Aggregate Query Method Of Power Quality Data
3	Research On Approximate Query Algorithm For Real-time Analysis Of Massive Data
4	Research And Implementation On Aggregation Query Optimization Under The Big Data Environment
5	Studies On Efficient Query Scheduling And Data Acquiring Techniques Of WEB Data
6	Queries with Bounded Errors & Bounded Response Times on Very Large Data
7	Approximate Aggregation Of Time-Varying Data In P2P Networks Based On Uniform Sampling
8	Research On Deep Web Data Analysis Based On Stratified Sampling
9	Online Aggregation Optimization For BIG Data In Cloud
10	Research On Techniques Of Data Aggregation And Query With Privacy-Preservation In Wireless Sensor Networks