Font Size: a A A

SQL Synthesis Based On Program By Example

Posted on:2024-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:2568307058472614Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the widespread use of databases and the popularity of data analysis,the number of non-expert users without a professional background needing to use SQL queries is increasing.However,these users lack the ability to write SQL queries,making it increasingly important to efficiently and accurately synthesize SQL queries,which has attracted widespread attention from relevant researchers.Example-driven programming is a method that uses input-output examples provided by users as program synthesis specifications.Due to its simplicity and ease of use,it has been widely used in fields such as tensor transformation and SQL synthesis.In the field of databases,the method of synthesizing query queries using examples(i.e.,input-output tables)is also known as query reverse engineering(QRE),which has always been an important method of SQL synthesis.However,QRE faces problems such as complex program spaces and diverse user intentions.The complex program space reduces the efficiency of the synthesis system in searching for the target program.Therefore,it is necessary to prune the program space and use heuristic search methods to improve search efficiency.The diverse user intentions result in ambiguous synthesized query queries,which satisfy the input-output tables but do not match the user’s true intentions.Therefore,QRE research can be divided into three dimensions: program space,search techniques,and user intentions.The goal of this article is to improve the efficiency of SQL query synthesis and reduce ambiguity in SQL query.By leveraging the advantages of deep learning,this article conducts in-depth and systematic research on three key technical issues: pruning program space,improving search techniques,and identifying user intent.The main work includes the following three aspects:(1)To address the problem of complex program space in SQL synthesis,a program space reduction-based SQL synthesis method is proposed.In the program space,there are many programs that are irrelevant to the target query,which can interfere with the search process of SQL synthesis and reduce the synthesis efficiency.To address this problem,this article proposes a program space reduction method that uses deep neural networks to predict the relevance of input-output table to production rules in the domain-specific language(DSL),and deletes production rules with low relevance,thereby reducing the number of irrelevant programs in the program space,reducing the size of the program space,and achieving the goal of improving synthesis efficiency.Experimental results show that compared with the current best comprehensive SQL synthesis system(SQUARES),the success rate of the proposed method is increased from 80% to 89.1%,and the average synthesis time is reduced from 251 s to 130 s.(2)To address the problem of low search efficiency in SQL synthesis process,a search process-adjusted SQL synthesis method is proposed.Since the program space is undecidable,SQL query synthesis requires the use of search techniques to find the target program in the program space,or to predict the target program using deep learning.However,the success rate of using deep learning to synthesize programs is lower than that of search-based methods,so search-based SQL synthesis methods are currently the main research direction in the QRE field.In the search techniques of QRE,enumeration search is the most commonly used search technique,which enumerates programs with an enumeration round of 1and continues until the target program is found.In this search process,the enumeration and verification of irrelevant programs with low enumeration rounds reduce the efficiency of SQL synthesis.Based on this,this article proposes a search process adjustment method that uses neural networks to predict the enumeration round of the target program,adjusts the starting point of enumeration search,and no longer enumerates and verifies candidate programs with enumeration rounds less than the predicted round.Experimental results demonstrate that compared with SQUARES,the success rate of the proposed method is increased from 80%to 82.3%,and the average synthesis time is reduced from 251 s to 75 s.(3)To address the problem of diverse user intent in SQL synthesis,a user intent recognitionbased SQL synthesis method is proposed.The diversity of user intent reduces the accuracy of QRE system in recognizing user intent.Even with efficient search techniques,SQL synthesis systems still face the problem of being unable to synthesize SQL queries that meet the user’s true intent.In SQL synthesis,non-expert users use input-output tables to express their intent in a relatively concise and user-friendly way.However,there is ambiguity in input-output tables,which means that the synthesized SQL query may satisfy the input-output tables but may not reflect the user’s true intent.Based on this,this article proposes a user intent recognition method that uses grammar templates to convert SQL queries into natural language that is understandable to users and interacts with users to confirm whether the synthesized SQL query meets their true intent,thereby improving the ability of the synthesis system to recognize user intent.Experimental results show that compared with SQUARES,which has a probability of less than 50% of synthesizing SQL queries that meet user intent,the proposed method has a probability of 84% of synthesizing SQL queries that meet user’s true intent.At the same time,most users only need to confirm their intent less than twice.
Keywords/Search Tags:program synthesis, SQL synthesis, query reverse engineering, deep learning, user intent recognition, domain-specific language
PDF Full Text Request
Related items