Font Size: a A A

Research On Structured Knowl Edge Based Deep Semant Ic Parsing

Posted on:2021-01-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:B Y XuFull Text:PDF
GTID:1368330602993438Subject:Computer applications engineering
Abstract/Summary:PDF Full Text Request
Human-computer interaction is one of the key development areas of artificial intelligence.Among many intelligent human-computer interaction technologies,the semantic parsing is the technology realizes the automatic conversion of natural language into machine-understandable commands,which plays an important role in human-computer interaction scenarios,such as machine translation systems,question answering systems,and automatic code generation Therefore,semantic parsing technology is a hot spot in the field of human-computer interaction and is attracting the attention of many important research institutions and scholars at home and abroadNatural language interfaces to database(NLIDB)based on semantic parsing is one of the most challenging works,including the following three aspects:1)The transformation of natural language into database query statements has complex grammar learning problems,and the validity of the generated database query statement cannot be guaranteed.2)The transformation model adopts supervised learning training methods,and there is a problem that the objective function at the character level and the evaluation metric based on the accuracy of database query are mismatch;3)The generated database query statement is unreadable by ordinary users and cannot verify the accuracy of the result.In response to the above three challenges,based on the idea of fusing structured knowledge,this paper explores deep semantic analysis technology,and has achieved the following research results and innovations:1.This paper proposes an encoder-decoder framework for natural language to database queries.In the encoding phase,in order to link the input of natural language to database schema,labels which related database schema are defined by the Backus-Naur Form(BNF)for tagging a structured representation of natural language.In the decoding phase,the change of grammar state of the output database queries can be tracked through the SQL's grammar state automaton and be integrated into the neural network structure.Further,the dictionary space of output words can be limited by the current grammar state,when the output words of dictionary space does not conform to the grammar rules will be filtered.In general,the core idea of model is to deeply integrate the known grammar structure of SQL and the structure information of database schema,which greatly reduces the learning cost of the model.The empirical evaluation on real world database and queries shown that our approach outperforms state-of-the-art solution by a significant margin.2.This paper proposes to use reinforcement learning method to train conversion model for natural language to database queries,which is used to solve the problem that the model objective function and the task evaluation metrics are not match.Through the policy gradient reinforcement learning method,the similarity of words and the accuracy of the query results are used as rewards to feedback to the model training.On the one hand,since natural language to dabase query statements are long sequence generation problems,it is difficult to sample correctly for the long sequence in reinforcement learning training.Moreover,the grammatical structure of SQL is complex,resulting in a syntax error in the sampling generated sequence.On the other hand,the accuracy of the query results only has positive feedback when the generated sequence query results and the target sequence query results intersect,resulting in the problem of intensive learning training reward sparse.This paper proposes a new sampling method of fusion structured knowledge to reduce grammatical errors,while reducing the size of the sampling space and improving the efficiency of enhanced learning sampling.Further,this paper also proves the effectiveness of sampling efficiency through theoretical analysis.And it is supported by multiple sets of analytical experiments.3.This paper considers the technical research from database queries to natural language generation from the inverse process.Under the existing research technology and data support,there is still a certain distance between the accuracy of the model and the requirements of productization.In order to overcome the current predicament,this paper considers introducing user feedbacks to improve the accuracy of the model.However,users who do not have a computer science background or who do not master SQL cannot judge the accuracy of the predicted database query.Therefore,how to generated a natural language description from the predicted database query is a technical problem that needs to be solved.Natural language generation problems for structured queries or other programming code,suitable for application scenarios such as human-computer interaction and automatic code generation.Existing methods are not accurate enough in extracting source information from structured input,resulting in the lack of critical information in generating natural language descriptions.This paper proposes a copy mechanism that integrates structured knowledge.The grammatical type limits the size of the dictionary space under the copy mechanism,which effectively improves the accuracy of copying source input information.4.This paper combines the key techniques of deep semantic parsing in the previous research,and proposes a new database natural language query system(NADAQ).Through the combination of transformation model for natural language to database queries and reinforcement learning effective sampling method,NADAQ realizes the core functions of the underlying database through natural language.Through the natural language generation model based on the fusion structured knowledge copy mechanism,NADAQ can reverse the transformed database queries to generate natural language descriptions,so that users can further confirm and improve the system accuracy.In addition,the system adds a rejection component to filter meaningless or unrelated input problems,and the recommendation component provides users with multiple candidate queries.The NADAQ system provides a solution for human computer interaction database applications,and promotes the industry development.
Keywords/Search Tags:Semantic Parsing, Natural Language Processing, Database Queries
PDF Full Text Request
Related items