Font Size: a A A

Quantifying Query Interaction And Modeling Query Response Time

Posted on:2019-11-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:J W ZhangFull Text:PDF
GTID:1368330596982301Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the performance management of database systems,models for predicting query response time play an important role and are widely used in tasks such as query scheduling,resource allocation,load balancing,and performance tuning,etc.Query interaction,an interesting phenomenon that query response time might be accelerated or decelerated by concurrent queries,has to be taken into account when building models for predicting query response time.The current performance models can be divided into two categories,the "analytical model" and the "statistical model".Analytical models predict query response time by studying the query execution process.It requires complex modeling methods to describe the query interactions,resulting in an extreme complex model and poor usability.In contrast,statistical modeling uses machine learning methods to model the query response time.Although this method can balance the complexity and availability of the models,it has the problems of high sampling cost and incompetency for describing query interactions,and results in a static model.This thesis focuses on the statistical model of query response time,tries to quantify the query interactions,selects optimal training samples to reduce the sampling cost,and finds online dynamical statistical modeling methods for management of database systems.For reducing the sampling cost,this thesis selects high-quality samples by clustering the query mix space.Compared with the traditional random sampling method,the cluster sampling can select samples according to the quality of a sample,and can also use a small number of samples to train the statistical model to achieve a reasonable prediction performance and reduce the cost of statistical modeling.In order to cluster the query mix space,this thesis proposes QueryRating,a measure for quantifying interactions between two queries,and constructs the feature vector of query interaction for a query based on the QueryRating.A query mix is then mapped into a two-dimensional space to form its feature vector,which is used for clustering query mixes.To address the issue that statistical modeling is incompetent to describe the query interactions,this thesis proposes a similarity model to predict the response time of a query in terms of query mix similarity which is measured by using the feature vectors of query mixes.The model is built from the perspective of quantifying query interaction,which uses the similarity between the query mixes to make prediction,avoids the complex linear fitting step,and emphasizes query interactions for better prediction accuracy.To make the statistical model dynamic,this thesis proposes an online algorithm for updating the similarity model.The algorithm can use the parameters of query mixes collected by the database system to update the sample set.Compared with the traditional linear model,the prediction performance of the similarity model is not restricted by the initial samples,and the model is highly dynamic.As an application and verification example of the similarity model,this thesis constructs an online query scheduler based on the similarity model to speed up the execution of a batch of queries.The scheduler finds the desired query mixes by solving the linear programming problem for minimizing the query interaction quantity.Whenever a query exits,the scheduler selects an appropriate query to form a desired query mix with the running queries.Compared with the existing schedulers,the proposed scheduler has a finer scheduling granularity,and can find the most appropriate query for executing on-the-fly,which reduce the total turn around time and improve the efficiency of the database system.The contributions of this thesis are as follows.(1)A method for quantifying query interaction is proposed.Query interaction is the primary factor for the change of query response time.There are no methods for quantifying query interactions in the existing researches.The interaction between two queries is the simplest form of query interaction.This thesis proposes QueryRating,the set of ratios of the response time of a query running in isolation to that running with another query.It quantifies the query interaction in the simplest form of query interaction and is the basis for measuring query interaction in more complex forms of query interaction.Based on this,the query interaction vector and the query mix interaction vector are derived,which form the basis for the subsequent study.(2)A cluster sampling method is proposed.The method clusters query mixes based on query mix interaction vector in the Euclidean space,and selects high-quality samples from each cluster,which can significantly reduce the number of required samples with guaranteed prediction performance,and reduce the cost for modeling.(3)A similarity model that predicts query response time by leveraging the similarity between query mixes is proposed.The model calculates the similarity between query mixes by quantifying the query interaction in the query mixes,and predicts the query response time of query in a mix by using that in the similar query mixes.The model has better performance than traditional linear models.(4)An online update method is proposed to make the similarity model dynamic.The method makes use of the performance parameters of query mixes to update the sample set,which enable the model accuracy to get rid of the restriction of initial samples,and improve dynamically.(5)A query scheduler that minimizes the query interaction of query mixes is proposed—minimizes query interaction scheduler.The scheduler selects an appropriate query to execute by solving the linear programming problem for minimizing the query interaction quantity,which effectively reduces the search space during online scheduling,enabling the scheduler to select the appropriate query to enter the system to shorten the total turn around time of a workload and improve the efficiency.
Keywords/Search Tags:query interaction quantification, similarity of query mixes, query scheduling, performance prediction model, statistical modeling
PDF Full Text Request
Related items