Font Size: a A A

Research On Automatic Semantic Parsing Algorithm From Text To SQL

Posted on:2022-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y WangFull Text:PDF
GTID:2518306779970129Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Relational database stores a large amount of data and is the foundation and core of information system.Users can use SQL statements to retrieve data from a database,but this usually requires a certain level of SQL mastery.In order to reduce the learning cost of users and improve their efficiency and experience,the natural language query technology for relational database has emerged.It allows users to interact directly with the database through natural language to get the required data,and its core is to parse natural language into SQL statements(NL2SQL).However,the current NL2 SQL models still suffer from two challenges,including 1)lack of semantics in questions;2)insufficient use of information in dataset.In this paper,the following research is conducted to address the above challenges:(1)To address the problem of lack of semantics in questions,this paper proposes an NL2 SQL approach based on knowledge enhancement to supplement the background knowledge of named entities in natural language questions through knowledge graphs.Specifically,the model first uses the entity linking technique to link the named entities in the question to the external knowledge graph,and then enhances the NL2 SQL models understanding of the question by introducing four types of knowledge(abstract,type,category,and infobox)of named entities in questions in external knowledge graphs,and then improves the parsing effect.This paper proposes a symbolization-based and two vectorization-based(textual embedding and graph embedding)knowledge enhancement schemes,and systematically demonstrates the effects of introducing different knowledge and the advantages and disadvantages of different enhancement methods.(2)To address the problem of insufficient use of information in the dataset,this paper proposes an NL2 SQL approach based on two-stage curriculum learning(preview + attend)to use the information in data to guide model training.Specifically,the goal of the preview stage is to train the encoder of the NL2 SQL model,and a new task of intention(table)recognition is designed by additionally considering the correspondence between questions and tables in the dataset in order to enable it to learn the consistent encoding of questions and tables.The goal of the attend stage is to train the NL2 SQL model,and to enable it to find better local optima,the difficulty differences of the data in the dataset are additionally considered,and a model-independent easy-to-hard curriculum framework is designed.(3)Experiments are conducted on the Wiki SQL dataset.For the NL2 SQL approach based on knowledge augmentation,this paper compares the effects of four different types of knowledge in the knowledge graph on one symbolization-based and two vectorization-based knowledge enhancement methods,respectively.The experimental results show that both symbolization-based and vectorization-based enhancement schemes using these four types of knowledge can improve the model performance,among which knowledge enhancement with textual embedding using type and infobox knowledge are the most effective.For the NL2 SQL approach based on two-stage curriculum learning,the paper compares the effects of using the two stages separately and jointly to train NL2 SQL models.The experimental results show that both stages are essential,with the preview stage providing the NL2 SQL model with encoder learning the consistent representations between questions and tables,and the attend stage providing a better training framework for the NL2 SQL model.
Keywords/Search Tags:Semantic parsing, Knowledge enhancement, Entity linking, Curriculum learning
PDF Full Text Request
Related items