Research On Leverage-based Approximate Aggregation Algorithms On Big Data

Posted on:2018-09-05

Degree:Master

Type:Thesis

Country:China

Candidate:S S Han

Full Text:PDF

GTID:2348330536981918

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Currently data explosion poses great challenges to approximate aggregation on efficiency and accuracy.To increase efficiency and accuracy in approximate aggregation,we introduce the leverage mechanism in data management.In traditional leverage mechanism,leverages reflect the individual differences of samples as well as their contributions to the overall aggregation.We inherit the idea of idea and introduce the leverages into data management,thus increase accuracy and efficiency.Considering the individual differences of data,we propose a novel leverage strategy,which considers the nature of data and divide them into different regions then handles data differently.Based on such leverages which reflect the individual differences of data,we propose a leverage-based iteration scheme,and construct objective functions using leverages and samples.Due to such mechanism,our approach is insensitive to sampling sequences.Besides using leverages to reflect individual differences of samples,we also extend leverages to the data blocks.To achieve a tradeoff between accuracy and efficiency,we calculate leverage-based leverages for each block.Such leverage reflect s the differences of blocks.Using such leverages,we generate different sampling rates for bloc ks.This paper focus on the following three problems: AVG approximate aggregation on i.i.d data,AVG approximate aggregation on non-i.i.d data,and extreme value aggregation.In research on AVG approximate aggregation on i.i.d data,we propose a novel methodology for a high-precision estimation,which involves generating two estimators using different methods to process constrained modulations iteratively according to the actual conditions of data to obtain high-quality estimate answers.We introduce a leverage-based iteration scheme,which uses leverages to reflect individual differences of samples,and uses iteration scheme to increase precision.Based on such mechanism,a high-quality approximate aggregation answer is obtained.In research on AVG approximate aggregation on non-i.i.d data,we inherite the AVG approximate aggregation on i.i.d distributions,and consider the differences of blocks to generate different sampling rates for them.In the sampling stage,we assign blocks with different leverages to generate different sampling rates.To balance the tradeoff between accuracy and efficiency,we consider the standard deviations of blocks.For blocks with larger standard deviations,due to its sophisticated data distribution,to obtain enough information of the data,we assign larger sampling rates to these blocks;for blocks with smaller standard deviations,due to their homogenous data,only a smaller sample is required to obtain the information of distribution,thus we apply smaller sampling rates.Based on such mechanism,we achieve a balance between accuracy and efficiency.In extreme value aggregation,we inherit the sampling rates in AVG approximate aggregation on non-i.i.d data.Meanwhile,due to the particularity of extreme value aggregation,the MAX/MIN value is more possible to be in blocks with larger/smaller averages.Thus,when calculating the leverages of blocks,we also consider the averages of blocks.Based on such sampling rates,we propose a n approach to guessing the extreme value using samples,which does not depend on existed models to guessing the data distributions as well as the extreme values.Our approach uses samples and the sampling processes to speculate the data distribution,leading to much flexibility.

Keywords/Search Tags:

aggregation, leverage, high accuracy, big data

PDF Full Text Request

Related items

1	Research On High Efficient And Real Time Data Aggregation Algorithm In WSNs
2	Research On Secure Aggregation In VANETs
3	Research On The Short Distance High Accuracy Of Pulsed Laser Range Finder
4	Implementation Of A Negotiation Model Based On The Principle Of Leverage
5	Research On The Short Distance And High Accuracy Pulsed Laser Finder
6	Data mining to identify optimal spatial aggregation scales and input features: Digital image classification with topographic LIDAR and LIDAR intensity returns
7	Research And Optimization Of Data Aggregation Based On Authentication And Encryption
8	High Accuracy Data Acquisition System And Its Application In Capacitive Position Sensor Research
9	High Speed And High Accuracy Current-Steering Digitial-to-Analog Converter Resuich And Disign
10	Research And Application Of Visual Modeling Technology In Data Source Aggregation