Font Size: a A A

Research And Implementation Of Information Extraction And Knowledge Graph Question Answering Based On Deep Learning

Posted on:2022-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:C R ChenFull Text:PDF
GTID:2518306605989469Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the development of mobile Internet,a large number of users can easily publish their user-generated content on Internet,and a large amount of content exists in the form of unstructured text.When users want to retrieve these text content,it takes time to find the structured information they want from the complicated retrieval results.In order to facilitate users to obtain structured information more quickly and accurately,automated information extraction have become the key to meeting this demand.Information extraction methods study how to extract some structured information,such as triad,from a piece of unstructured text.The triad store structured information in the form of <Subject,Predicate,Object> to form a knowledge graph,which is convenient for other modules to call.Traditional information extraction methods such as rule templates,machine learning and other methods have complex feature engineering,and the modeling and extraction performance are not ideal.The deep learning method establishes the mapping relationship between input and output through simple feature engineering,and the extraction performance is better.Information extraction methods based on deep learning can be divided into two categories: one is pipeline method,which is divided into two independent subtasks,which have weak task relevance,which affects the final performance;another is joint method,two subtasks establish an association relationship by the sharing encoding layer,the subtasks models are training together,and the performance is better than pipeline method,but the information shared between two sub-tasks is relatively single.Both pipeline and joint method have the problem of slow decoding.In order to solve the problem of the single information shared by the joint extractor method,and to study the impact of different subtasks on the model performance,this paper designs two hierarchical binary labeling information extraction models based on attention mechanism: P-SO model and SP-O model,on the basis of shared coding,the attention mechanism is used to fuse the flow information between two subtasks,making two subtasks more relevant.The feature engineering of two models designed in this paper is simple and does not use any natural language processing tools,such as word segmentation,part-ofspeech tagging,etc.,to avoid introducing new errors,and at the same time,in engineering applications,faster inferred decoding speeds can also be obtained.This paper also designed different experiments to study the influence of different embedding of characters and different recurrent neural networks on performance of the model.In the information extraction model designed in this paper,the F1 score of the SP-O model reaches 0.801 in the case of simple feature engineering that only uses characters embeddings and position embeddings.In order to solve the problem of slow inference and decoding speed of samples one by one,this paper designs a batch inference decoding method.The inferred decoding speed on the data set with GTX 1050 Ti graphics card reaches 359 item per second,which is an average increase of 817% compared with inferred decoding speed one by one.After the information extraction task is completed,this article combines the application scenarios of the triple knowledge graph to design a question and answering system based on the knowledge graph,which realizes functions including information extraction,triad management,and knowledge graph question and answering,triad display,etc.The question and answering process designed in this paper cleverly uses the SP sub-model in the SP-O model to handle users' questions,avoids the complex calculation of text similarity,get the answer directly.Finally,experiments are designed to verify the correctness of the question and answering process,the stability of the question and answering system and the timeliness of response.
Keywords/Search Tags:information extraction, knowledge graph, question answering system, pointer labeling, deep learning
PDF Full Text Request
Related items