Font Size: a A A

Research On Leverage-based Approximate Aggregation Algorithms On Big Data

Posted on:2018-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:S S HanFull Text:PDF
GTID:2348330536981918Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Currently data explosion poses great challenges to approximate aggregation on efficiency and accuracy.To increase efficiency and accuracy in approximate aggregation,we introduce the leverage mechanism in data management.In traditional leverage mechanism,leverages reflect the individual differences of samples as well as their contributions to the overall aggregation.We inherit the idea of idea and introduce the leverages into data management,thus increase accuracy and efficiency.Considering the individual differences of data,we propose a novel leverage strategy,which considers the nature of data and divide them into different regions then handles data differently.Based on such leverages which reflect the individual differences of data,we propose a leverage-based iteration scheme,and construct objective functions using leverages and samples.Due to such mechanism,our approach is insensitive to sampling sequences.Besides using leverages to reflect individual differences of samples,we also extend leverages to the data blocks.To achieve a tradeoff between accuracy and efficiency,we calculate leverage-based leverages for each block.Such leverage reflect s the differences of blocks.Using such leverages,we generate different sampling rates for bloc ks.This paper focus on the following three problems: AVG approximate aggregation on i.i.d data,AVG approximate aggregation on non-i.i.d data,and extreme value aggregation.In research on AVG approximate aggregation on i.i.d data,we propose a novel methodology for a high-precision estimation,which involves generating two estimators using different methods to process constrained modulations iteratively according to the actual conditions of data to obtain high-quality estimate answers.We introduce a leverage-based iteration scheme,which uses leverages to reflect individual differences of samples,and uses iteration scheme to increase precision.Based on such mechanism,a high-quality approximate aggregation answer is obtained.In research on AVG approximate aggregation on non-i.i.d data,we inherite the AVG approximate aggregation on i.i.d distributions,and consider the differences of blocks to generate different sampling rates for them.In the sampling stage,we assign blocks with different leverages to generate different sampling rates.To balance the tradeoff between accuracy and efficiency,we consider the standard deviations of blocks.For blocks with larger standard deviations,due to its sophisticated data distribution,to obtain enough information of the data,we assign larger sampling rates to these blocks;for blocks with smaller standard deviations,due to their homogenous data,only a smaller sample is required to obtain the information of distribution,thus we apply smaller sampling rates.Based on such mechanism,we achieve a balance between accuracy and efficiency.In extreme value aggregation,we inherit the sampling rates in AVG approximate aggregation on non-i.i.d data.Meanwhile,due to the particularity of extreme value aggregation,the MAX/MIN value is more possible to be in blocks with larger/smaller averages.Thus,when calculating the leverages of blocks,we also consider the averages of blocks.Based on such sampling rates,we propose a n approach to guessing the extreme value using samples,which does not depend on existed models to guessing the data distributions as well as the extreme values.Our approach uses samples and the sampling processes to speculate the data distribution,leading to much flexibility.
Keywords/Search Tags:aggregation, leverage, high accuracy, big data
PDF Full Text Request
Related items