Font Size: a A A

Clause Based Open Domain Information Extraction

Posted on:2022-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y GaoFull Text:PDF
GTID:2518306551470244Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The information extraction task aims to automatically extract information from unstructured text and convert it into structured triples(entity-relation-entity).According to whether to limit the relationship categories in the triples,information extraction task can be divided into two types: closed domain and open domain.Open domain information extraction does not limit the relationship categories in the triples.It extracts all possible triples from the text and provides strong support for downstream natural language processing tasks such as question answering systems,information retrieval,and knowledge base construction.At present,most open-domain information extraction works extract triples from sentences,but it is very challenging to learn extraction templates and formulate extraction rules on complex sentences.Some researchers have pointed out that the clause structure is simple and usually contains only one triple.Therefore,the clause-based extraction method is proposed.The complex sentence is converted into a simple clause through clause identification,which greatly reduces the difficulty of triple extraction.However,existing work regards clause identification as an edge classification task on the dependency tree,which has the problem of error propagation.In addition,current methods all use hand-crafted templates to extract triples from clauses,but the coverage of these templates is narrow and they are difficult to adapt to the complex language environment.In response to the above two issues,this paper has carried out the following work:First,in view of the error propagation problems in existing clause identification method,this paper regards the problem of clause identification as a subtree classification task on the dependency tree,and proposes a dynamic recursive neural network to learn subtree representations that contain global syntactic information for more effective clause identification.Experiments show that the clause identification method proposed in this paper is better than existing methods,and the clauses identified by this method can effectively improve the effect of subsequent relation extraction task.Second,in order to solve the problems of narrow coverage and insufficient adaptability of the current relation extraction methods on clauses,this paper proposes to use deep learning methods to extract triples from clauses,identifying triples more effectively by automatically learning semantic information.In addition,due to the omission of components in the recognized clauses,the extraction of triples is incomplete,and the recognized clauses need to be filled.This paper proposes a clause filling model based on deep learning,automatically capturing deeplevel semantic information to identify the of omission of clauses and fill clauses.Then the relation extraction model proposed in this paper is used to extract triples on the filled clauses.The experimental results show that the method proposed in this paper is better than existing open-domain information extraction methods,which proves the effectiveness of this paper using deep learning methods to fill and extract on clauses.
Keywords/Search Tags:open domain information extraction, recursive neural network, clause identification, clause filling, relation extraction
PDF Full Text Request
Related items