Font Size: a A A

A Technology Of Generating SQL Through Chinese Natural Language Queries Based On Deep Learning

Posted on:2021-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:J C CaoFull Text:PDF
GTID:2428330623969139Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,natural language processing is one of the fastest developing research directions in the field of artificial intelligence.Using natural language to query data in database with human-computer interaction,which can not only save users the cost of learning professional knowledge,but also improve the efficiency of querying data.Therefore,the natural language processing technology of generating SQL statements through natural language queries,referred to as NL2 SQL task,especially Chinese NL2 SQL has important research value.How to accurately understand natural language from the grammatical and semantic level,and how to eliminate the gap in expression and structure between the natural language query,the structure and content of data table in database,and the SQL statement are the main challenges faced in the NL2 SQL task.Existing models for NL2 SQL task are mainly oriented to English text,and cannot solve the problems in Chinese text data of column name reuse,inconsistency of descriptions in natural language queries and data representations stored in database.In this paper,the Chinese NL2 SQL task is taken as research object.For single-table and its extended multi-table query scenarios,from the perspective of improving the accuracy of generating SQL statements,multiple deep learning models are constructed to realize the conversion of natural language queries to SQL statements.The main work and contributions of this paper include:(1)A complete process and method for Chinese NL2 SQL task is proposed,which combine pre-trained model and deep learning classification model.This method makes full use of the feature expression capabilities of the latest pre-trained model while processing text,and constructs a corresponding deep learning classification model on each sub-task by means of fine-tuning.(2)In the single-table query scenario,a general classification model and condition value acquisition model are proposed to generate SQL statements.The general classification model improves the problem of column name reuse when predicting column names in SQL statements;the condition value acquisition model distinguishes text and real column values when predicting condition values in SQL,and improves the inconsistency problem of descriptions in natural language queries and data representations stored in database.Through comparative experiments,it is proved that this method improves the accuracy of SQL statement generation from many aspects such as column name and condition value prediction.(3)In the multi-table query scenario,the Chinese NL2 SQL task is proposed to be decomposed into two sub-tasks: SQL clause generation and JOIN path generation.In the SQL clause generation,two solutions are proposed,both of which learn the ideas from the single-table query scenario,and are an extension of the applicable scenario of the single-table query scenario model;in the JOIN path generation,it is proposed that the problem is modeled as Steiner Tree generation problem,and a global optimization algorithm is used to solve the problem.Experiments show that this method can be effectively extended from single-table query scenario to multi-table query scenario,and complete the generation of multi-table SQL statements.
Keywords/Search Tags:natural language processing, Chinese NL2SQL, pre-trained model, single-table, multi-table
PDF Full Text Request
Related items