Font Size: a A A

Research On Generating Complex SQL Statements From Chinese Natural Language Based On Deep Learning

Posted on:2022-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y W LinFull Text:PDF
GTID:2518306572997379Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Due to the development of information technology is really fast,massive data are stored in the database structurally.When retrieving the data,it is necessary to use the unified database query language SQL.However,as a structured query language with strict grammar rules,SQL requires users to have professional knowledge of database and SQL,which has a high threshold for use and is not friendly to non-professional users.In recent years,in order to improve the efficiency of database information retrieval and reduce the user's threshold to make the database better serve public,the research on the direct generation of natural language problems into SQL statements by computers has attracted people's attention.This research task is also known as Text-to-SQL task.As a comprehensive task,Text-to-SQL needs to solve many key problems,such as natural language encoding,database schema encoding,relational discovery,relational encoding,SQL statement decoding and so on.At present,most of the existing research work is based on the natural language problem and the target database schema is in English,and the structure of SQL statements has been greatly simplified,so it is impossible to deal with the complex SQL statements,and there are many limitations,it is difficult to meet the needs of engineering practice.Aiming at the characteristics of complex Chinese Text-to-SQL tasks,the RACN-SQL model to generate complex SQL from Chinese natural language is proposed after analyzing the existing problems in the current research.Through semantic encoding,database schema relational representation,schema linking,relational encoding,and SQL decoding,the model realizes the conversion from natural language questions to SQL statements,and finally output SQL query results.Compared with other researches,the main contributions of the RACN-SQL model are as follows:(1)solving the cross-language semantic encoding barrier between the Chinese natural language problem and the English database schema;(2)finding and expressing the correlation between natural language and database schema reasonably and effectively;(3)Synthetically considering the characteristics between unstructured data and structured data,the semantic information and relational information are combined for joint encoding.In the experiment part,complex Chinese Text-to-SQL dataset and evaluation methods that meet the actual needs are selected.Through comparative experiments with other classic models and excellent models,the effectiveness of the proposed RACN-SQL model is verified.Meanwhile,the existing deficiencies of the model and the direction that future research needs to pay attention to are analyzed.
Keywords/Search Tags:Text-to-SQL, Information Retrieval, Semantic Encoding, Schema Linking, Relational Encoding
PDF Full Text Request
Related items