Font Size: a A A

Research And Improvement Of Chinese NL2SQL Model Based On Single Table

Posted on:2022-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2518306749471744Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
In recent years,semantic parsing,one of the key technologies of natural language processing,has attracted more and more attention.The NL2 SQL task belongs to semantic parsing.The NL2 SQL task is to convert natural language descriptions into executable SQL queries through models.Due to the difference between Chinese text and English text,the previous NL2 SQL model based on English dataset cannot be directly applied to Chinese text.At the same time,the existing NL2 SQL models generally use the sequence generation model to predict the condition value.The accuracy of the condition value predicted by this method is low,and the importance of data quality and model generalization performance is often ignored when studying NL2 SQL tasks.The innovations of this thesis are as follows:(1)Improvements are made on the basis of SQLNet,and Ch?SQLNet is obtained by adding two tasks,sel?num and where?num,while the structure of SQLNet remains unchanged.(2)In this thesis,Ch?SQLNet and Pre?NL2SQL are divided into 8 sub-tasks according to the structure of SQL statements,and the accuracy of the 8 sub-tasks is compared and experimentally analyzed on the Chinese data set Table QA.Finally,the prediction results of the8 sub-tasks are composed of SQL statements,and the accuracy of query matching is analyzed.The performance of the model is evaluated on two evaluation indicators: the accuracy of the execution result and the.(3)Use special data preprocessing and RDrop regularization to improve the accuracy of Ch?SQLNet and Pre?NL2SQL on evaluation indicators.The experimental results show that: 1.Ch?SQLNet has higher accuracy than SQLNet on8 subtasks,and is 19.1% and 17.2% higher than SQLNet on two evaluation metrics.2.Pre?NL2SQL has higher accuracy than Ch?SQLNet on 8 subtasks,and is 3.6% and 1.7% higher than Ch?SQLNet on two evaluation metrics.3.After special data preprocessing and RDrop regularization,Ch?SQLNet and Pre?NL2SQL improved by 0.1%-0.6% in two evaluation indicators.
Keywords/Search Tags:Natural Language processing, Chinese SQLNet, Pretraining Model, Model optimization
PDF Full Text Request
Related items