Font Size: a A A

Machine Learning Based Approximate Query Processing

Posted on:2022-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:C B LinFull Text:PDF
GTID:2518306479993299Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the era of big data,the ability to query and analyze big data is one of the fundamen-tal funtions of database.As important functions of modern database,aggregate functions and window functions have existed for decades.With the increasing demand of data anal-ysis,aggregate functions and window functions are abroad used.Aggregate functions can deal with simple analytical requirements,while window functions can effeciently process more complex queries with its concise but expressive syntax.However,the traditional implementation of aggregation function and window function in database can not meet the real-time requirements of data analysis facing the increasing of data scale.To solve this problem,a lot of work has abandoned the traditional method based on accurate calculation,and adopted AQP(approximate query processing)to handle ag-gregate functions and window functions,which offer uers a flexible trade-off between accuracy and efficiency.However,some defects exist in these works,such as the limited improvement of query response time,the less types of queries supported,the larger stor-age space overhead and so on.Therefore,aiming at these defects,this paper proposes two AQP frameworks based on machine learning for window functions and aggregate func-tions: WFApprox and DeepAQP.WFApprox uses density estimators and regression mod-els to efficiently provide approximate answer for multiple window functions and window functions syntax;DeepAQP uses Masked Autoencoder,a deep learning model,to fit the data distribution of database tables,and then approximately processes aggregate queries based on the models.The main work of this paper includes the following aspects:Approximate Window Functions Processing Traditional implemention of win-dow functions will produce a large number of disk I/O,which leads to low efficiency in large-scale dataset analysis.In this method,the density estimator is used to cap-ture the distribution of columns in the database table,and the regression model is used to process the mapping between columns,so as to efficiently process multiple window function queries.The experimental results show that the proposed method and the comparison method can achieve a minimum of about 2 times and a maxi-mum of about 100 times query response time improvement under different data sets with similar query error.Distribution-aware model integral technology In DeepAQP,in order to ef-ficiently obtain the data distribution after predicate filtering from the Masked Au-toencoder,this paper proposes a model integral technique.By using the distribution sensing ability of the Masked Autoencoder,the sampling points are concentrated in the area where the large mass of probability density exists,so as to reduce the sam-pling cost and improve the sampling accuracy,and then the sampling results are used to approximate the real distribution.Approximate Aggregate Functions Processing The existing works of approxi-mate aggregate functions processing are either based on sampling or machine learn-ing.However,the sampling based methods leave a lot to be desired in storage cost and query response time,while the machine learning based method only supports a limited types of queries.This method accurately represents the distribution of database tables with the help of Masked Autoencoder.For a database table,only one model is needed,which offers efficient and accurate answer for multi-predicate query.In addition,for multi-table join query,efficient sampling method is used to provide approximate answer.The experimental results show that the query error of this method and the comparison method have their own advantages and disadvan-tages under different predicate conditions,which are close to each other in general,but the query response time is improved by at least 10 times,and the storage space is also saved by more than 10 times compared with the comparison method.
Keywords/Search Tags:Aggregate functions, Window functions, Approximate query processing, Machine learning, Deeplearning
PDF Full Text Request
Related items