Font Size: a A A

Research On Database Query Time Prediction Algorithm Based On Deep Learning

Posted on:2021-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:J X NiFull Text:PDF
GTID:2428330623468513Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of big data industry,database,as the center of the big data era,has almost penetrated into all industries and become an important production factor.With the increasing demand and dependence of various industries on data,the traditional database management system(DBMS)has gradually evolved into a distributed data platform that can deal with massive data.For the distributed data platform,the load management and performance tuning of database are still eternal topics,and the query execution time prediction model of database,i.e.,the cost prediction model,is the key to improve its efficiency.At the same time,the query time prediction can also be applied to many database scenarios,such as query task permission control,query task scheduling,query progress monitoring,database system scale customization and so on.Query execution time prediction has a very broad application prospects and research value.The main contents of this paper are as follows:1)In this paper,the research progress and current situation of query execution time prediction are studied.In the distributed environment,it is difficult to get part of the statistical information of the database,and the estimation of the intermediate results is not accurate,which makes the traditional cost model difficult to accurately estimate the execution time of the query task.To solve above problems,this thesis proposes a query execution time prediction scheme for the distributed database system,which avoids the operation of intermediate result prediction and makes full use of the historical query task data accumulated by the commercial distributed data platform which contains historical task list and its resource allocation and completion and other log data.Database,deep learning and other technologies are used to established an end-to-end query time prediction model to accurately estimate the expected completion time of query tasks under certain resource constraints.2)In order to verify the feasibility of the proposed scheme,a large number of historical query log data of commercial distributed data platform are collected as experimental data.In this paper,a large number of preprocessing operations are carried out on the historical query log data,and the corresponding feature selection and mining are carried out for this scheme.The query execution plan of tree structure is transformed into the feature vector that can be sent into the model by topological sorting and one-hot encoding.Finally,the machine learning model with mature technology and stable performance is selected for the experiment,and the feasibility of the scheme is proved by the experimental results and the learning curve of the model.3)In order to further improve the accuracy of the query time prediction model,this thesis designs a joint training model based on deep learning for the problem that the machine learning model can not learn the sequential information in the data.The model consists of sequence module and deep module,and adopts LSTM,residual network and batch normalization to improve the effect of the model.Finally,through a large number of comparative experiments on real data,the superiority of the model based on deep learning in query execution time prediction is proved.
Keywords/Search Tags:query time prediction, distributed database system, query optimization, deep learning, machine learning
PDF Full Text Request
Related items