Clause Based Open Domain Information Extraction

Posted on:2022-06-22

Degree:Master

Type:Thesis

Country:China

Candidate:Y Gao

Full Text:PDF

GTID:2518306551470244

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The information extraction task aims to automatically extract information from unstructured text and convert it into structured triples(entity-relation-entity).According to whether to limit the relationship categories in the triples,information extraction task can be divided into two types: closed domain and open domain.Open domain information extraction does not limit the relationship categories in the triples.It extracts all possible triples from the text and provides strong support for downstream natural language processing tasks such as question answering systems,information retrieval,and knowledge base construction.At present,most open-domain information extraction works extract triples from sentences,but it is very challenging to learn extraction templates and formulate extraction rules on complex sentences.Some researchers have pointed out that the clause structure is simple and usually contains only one triple.Therefore,the clause-based extraction method is proposed.The complex sentence is converted into a simple clause through clause identification,which greatly reduces the difficulty of triple extraction.However,existing work regards clause identification as an edge classification task on the dependency tree,which has the problem of error propagation.In addition,current methods all use hand-crafted templates to extract triples from clauses,but the coverage of these templates is narrow and they are difficult to adapt to the complex language environment.In response to the above two issues,this paper has carried out the following work:First,in view of the error propagation problems in existing clause identification method,this paper regards the problem of clause identification as a subtree classification task on the dependency tree,and proposes a dynamic recursive neural network to learn subtree representations that contain global syntactic information for more effective clause identification.Experiments show that the clause identification method proposed in this paper is better than existing methods,and the clauses identified by this method can effectively improve the effect of subsequent relation extraction task.Second,in order to solve the problems of narrow coverage and insufficient adaptability of the current relation extraction methods on clauses,this paper proposes to use deep learning methods to extract triples from clauses,identifying triples more effectively by automatically learning semantic information.In addition,due to the omission of components in the recognized clauses,the extraction of triples is incomplete,and the recognized clauses need to be filled.This paper proposes a clause filling model based on deep learning,automatically capturing deeplevel semantic information to identify the of omission of clauses and fill clauses.Then the relation extraction model proposed in this paper is used to extract triples on the filled clauses.The experimental results show that the method proposed in this paper is better than existing open-domain information extraction methods,which proves the effectiveness of this paper using deep learning methods to fill and extract on clauses.

Keywords/Search Tags:

open domain information extraction, recursive neural network, clause identification, clause filling, relation extraction

PDF Full Text Request

Related items

1	Distant Supervision Based Relation Extraction Combining Clause Identification And Semi-supervised Ensemble Learning
2	Research On Clause-Level Context-aware Open Information Extraction
3	Research And Implementation Of Conformance Testing Clause Extraction And ICS Questionnaire Generation Algorithm
4	On Exact Algorithms For SAT And Related Problems
5	Relation Extraction From Complex Text In Open Domain
6	Based On The Clause, The Right To Re-solve The Sat Problem
7	Research And Application On Open Relation Indicated By Predicates Extraction
8	Research On Entity Relation Extraction In Network Encyclopedia
9	Extraction Of Entity Hyponymy And Synonymy Relations From Open Domain Texts
10	Research On Related Technologies Of Domain Information Extraction