Research On Online Aggregation Query Processing Based On Hadoop

Posted on:2016-11-08

Degree:Master

Type:Thesis

Country:China

Candidate:J H Hu

Full Text:PDF

GTID:2348330542975726

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Since entering the information era,the information increases explosively,which followed by a sharp increase in the amount of data.Getting useful information by processing massive data has become more and more urgent.For a long time,store and process data by RDBMS,and aggregation query is an important operation in statistical analysis.With the rapid growth of the amount of data which need to be processed,user need to wait a long time for getting an accurate aggregation result because of the batch mode of traditional relational database aggregation queries.Online aggregation queries can continue to give approximate results in the process of query processing based on current sample data,until all data is processed,we can get final result.When the precision of result arrival user desire,user can stop the query to save the user time and system resources.With the development of Hadoop,processing volume data get more efficiency.But the data is �limitless�,while computational and storage resources are limited.At present,it is hard to fundamentally solve the problem,but we are still able to propose some specific solutions for specific applications.We propose a Hadoop-based iterative sampling approximate aggregate query processing method by combining the advantage of Hadoop processing volume data and online aggregation query mode.The desired precision of user's approximate aggregate query results can be met by two iterative sampling.According to the user desire precision and the sample data which is the first iteratively sample,we compute the sample size which to meet the user desired precision.Use the sample data obtained from the two samples to return the approximate aggregate query results to the user.In order to avoid the effects of data bias,the paper propose a �layered sampling� method to ensure that the approximate aggregate result is statistical meaningful.Finally,in the experiments,we comparatively analyzes the effects on aggregation query result of various sampling methods,and the results show that our �layered sampling� method not only consider the time efficiency,which let user make a trade between processing time and the precision of result,but also it considers the usage of computational and storage resources of Hadoop cluster.we make a comparison in efficiency between the newest method of online aggregation base on Hadoop and our iterative aggregation query method,the experiment result indicates that our method is more efficient.

Keywords/Search Tags:

Online Aggregation, MapReduce, iteratively sample, sample size estimation

PDF Full Text Request

Related items

1	Sample size estimation with nonparametric methods for one sample location tests under clustered data
2	Precision-based sample size reduction for Bayesian experimentation using Markov chain simulation
3	Adequate sample sizes for viable 2-level hierarchical linear modeling analysis: A study on sample size requirement in HLM in relation to different intraclass correlations
4	A multi-faceted model of the consequences of sample size choice in usability testing
5	Study Of Face Recognition With Small Sample Size
6	Biometric Recognition Based On Samll Sample Size Problem
7	An Online Big Data Analytic System Leveraging Uncertain Query Processing
8	Enhanced CBET To Optimize Sample Size And Precision Analysis
9	Novel Cheminformatics Methods for Modeling Biomolecular Data in High Dimension Low Sample Size (HDLSS) Chemistry Space
10	Diverse sample analysis and sample preparation studies utilizing AP-MALDI-TOF-MS