Method For Calculating Approximate Results Based On Resampling

Posted on:2017-04-16

Degree:Master

Type:Thesis

Country:China

Candidate:J F Li

Full Text:PDF

GTID:2348330509957111

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet technology, there is explosive growth in the overall data size produced by various fields. However, big data analysis tends to consume very large resources and a very long time. And in many cases approximate results which are accurate enough and generated quickly are more popular to users compared to the exact results which are tardy in computation. When it comes to the approximate results on big data analysis, sampling is almost the only way that can reduce both the computing resources and the running time. However, in face of big data analysis, the simple random samples method tends to obtain a huge number of samples which makes the sampling of no use. And there are hardly any other sampling method for big data analysis which supports the mainstream distributed computing architecture(e.g., Map Reduce) perfectly. At the same time, in many cases, even when facing a same query request on the same data set, different users could have a different accuracy requirement for the approximate result. Thus, how to provide different users with different degree of approximate results has also become a problem to be solved.We proposed and implemented an accuracy controllable method which provides approximate results in big data analysis, based on Map Reduce computing architectures and sampling, to solve the problem of providing different users with different degree of approximate results. We take control of the precision of the approximate results by changing the sampling frequency in the big data sets. We also modified the kernel code of the Hadoop system to make the sampling method running efficiently and quickly in the distributed cluster. At the same time, the relationship between the accuracy of the approximate result and the resampling frequency are analyzed in detail.Finally, we verified that the system can not only reduce the running time of operation and the computing resources, but also provide the approximate result that is accurate enough, by a series of experiments in all kinds of data set. It demonstrates the validity and availability of our system results, and the advantages of the calculation scheme which provides accuracy controllable approximate result.

Keywords/Search Tags:

big data, resampling, accuracy controllable computing, approximate result

PDF Full Text Request

Related items

1	Accuracy Controllable BCNN For Common Word Recognition Computational Structure And Implementation
2	A Low Power Rnn Accelerator Based On Dynamic Accuracy Approximate Computing
3	Accuracy Configurable FFT Processor Based On Approximate Computing
4	Generalization Improvement Of Rough Set Approximate Accuracy
5	Scalable And Energy-efficient Cnn Accelerator Design Based On Dynamic Accuracy
6	Research On Approximate Aggregate Query Processing Over Low Usability Data
7	Research On Energy-efficient Circuit Design Technology Based On Approximate Computing
8	Research On Approximate Computing And Quality Assurance Strategies In Large-scale Stream Data Processing
9	RSP:A New Approach For Approximate Big Data Analysis
10	A Research On Key Technologies Of Deep Web Data Integration Based On Result Pattern