Font Size: a A A

Design And Implementation Of Question Answering System Based On Text To SQL

Posted on:2022-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z N NingFull Text:PDF
GTID:2518306746451934Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence technology,the task of transforming natural language into structured query language has attracted extensive attention in the academic community,because it provides non-professional users with the interface of database query,greatly reduces the learning cost of users querying data,improves the efficiency of query and people's life,and has high research value.At present,the task of converting Chinese natural language into structured query language faces many challenges.First,in the research of Chinese datasets,the prediction accuracy of its SQL statements is low,and there is still a lot of room for optimization;second,Chinese There are many synonyms in the expression of,how to accurately map between natural language questions and database column names is the main challenge for this task.This study takes the CSpider Chinese data set as the research object.The data set divides SQL statements into three types of prediction difficulty: easy,medium,and difficult.According to the different expression forms of Chinese characters and words,this study is divided into the research of Text-to-SQL based on words and the research of Text-to-SQL based on words.The main work is as follows:(1)A semantic enhancement model of word sequence integrating part of speech and dependent syntactic features is constructedIn terms of word-based research,this study adopts a new word segmentation method for the original data,and the word segmentation effect shows that it is closer to the semantic understanding of users' daily life.At the same time,in terms of word vectors,in order to enhance the model's understanding of semantics,part-of-speech feature vectors and dependent syntax feature vectors are added to the original word vectors.The experimental results show that adding part-of-speech features significantly improves the experimental Keyword and and\or subtasks for moderately difficult SQL statements,an increase of 15% and 7.8%.For hard difficulty SQL statements,the increase is 12.2% and 13.8%.(2)A word sequence model based on BERT is constructedIn terms of word-based research,this study uses the BERT model as an encoder to encode,and the word vector after training also contains the semantic features of the word.In addition,in order to enhance the semantic relationship between the natural language question and the column names of the database table,the natural language question and the database column name are spliced into sentence pairs and sent to the BERT multilingual model to generate vectors.The results show that after the database column name is used as the input sequence of the BERT model,for each subtask of the task,its accuracy,recall rate and F1 value have increased by about 5%,and for the Where subtask has increased by 10% about.In the end,the overall accuracy of the SQL statement,its simple difficulty,medium difficulty and difficult difficulty increased by3.6%,4.4% and 1.2% respectively.(3)The prototype of question answering system based on text to SQL is builtUsing the CSpider dataset as the database of the system,a system based on Textto-SQL technology is built to provide users with data conversion interfaces in various fields such as airline information,university information,and book information,and also includes auxiliary user questions and related information feedback.function to improve the user experience.
Keywords/Search Tags:Text-to-SQL, CSpider, Semantic Enhancement Model, Word Sequence Model, BERT
PDF Full Text Request
Related items