Font Size: a A A

Technical Research On Chinese NL2SQL Task Based On BERT

Posted on:2022-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y L OuFull Text:PDF
GTID:2518306338490744Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
In recent years,natural language processing has been developed rapidly,and using natural language to query target data in database freely has become an emerging research hotspot,and Natural Language to SQL(NL2SQL)is one such technology that can convert users' natural language statements into SQL statements that can be executed by computers,which is of great significance to improve the interaction between users and database It is of great significance to improve the interaction between users and databases,especially the research on Chinese NL2 SQL,which also reduces the learning cost for non-specialists to operate databases.How to model the natural language and the corresponding table structure and content rationally is a major challenge in the current NL2 SQL task.Most of the existing researches are on NL2 SQL tasks for English datasets,and there are fewer researches on Chinese.In the research on NL2 SQL tasks under single-table query,there is no solution to the problem that the conditional column names of WHERE clause in SQL statements and the content information of the tables corresponding to natural language statements are not utilized;in the research on NL2 SQL tasks under multi-table query In the study of NL2 SQL task under multi-table query,a shallower extraction method is used for extracting information of database schema structure,which leads to partial information loss and the decoder for generating SQL statements cannot have knowledge of the structure information of abstract syntax tree when decoding action sequences.In this paper,we have done some research on single-table and multi-table query scenarios for the Chinese NL2 SQL task,and the main work and contributions of this paper are.(1)NL2SQL research in single-table query scenario based on BERT.In this paper,we propose to split the task of generating SQL query statements from natural language interrogatives into a backbone model and a conditional value extraction model together.The backbone model solves the problem of column name reuse when predicting the column names in the WHERE clause;in the conditional value extraction model,the content information of the corresponding table is added to improve the performance of the model for conditional value extraction.In this paper,two different ideas are proposed to accomplish the conditional value extraction task separately,one is similar to machine reading comprehension [5] and the other is a categorical labeling approach,where each position of a natural language interrogative is labeled with 0/1,where 0 means it does not belong to the value and 1 means it does.The comparative experiments verified that the method can solve the problems of column name reuse and using the content information of the table to improve the accuracy of the task.(2)Research on NL2 SQL in multi-table query scenario based on BERT.In this paper,we propose a sequence-to-sequence model GNN-RAT based on encoder-decoder architecture,and adopt a way to extract structural information by encoding database schema structure with graph neural network for the database schema structure information extraction problem,which improves the problem of shallow information extraction of database schema structure;in the decoder part,we use the relation-aware transformer module instead of the previous Bi-LSTM module in the decoder part,which solves the problem that the decoder cannot obtain the structural information of the abstract syntax tree.It is verified through experiments that the method has positive effects on information extraction of database schema structure and on SQL generation.(3)The method and the complete procedure of Chinese NL2 SQL task by combining pre-trained BERT model and deep learning classification model are proposed.In this paper,we use the BERT model with strong semantic extraction performance for natural language processing,and fine-tune it for downstream tasks based on the features of the pre-trained model.This paper confirms through comparative experiments that the pre-trained BERT model has a positive effect on the accuracy of the generated SQL statement results.
Keywords/Search Tags:natural language processing, Chinese NL2SQL, pre-trained model, single table, multi-table
PDF Full Text Request
Related items