Font Size: a A A

N-ary Chinese Open Entity Relation Extraction

Posted on:2018-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2348330536465908Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Information extraction refers to extract the factual information of a specified type of entities,relationships and events from the natural language text,and convert this information into a structured output.With the development of artificial intelligence technology,information extraction has become a popular research field.Entity relationship extraction is an important task of information extraction.The main content of this is to extract the entity relationship types and entity-relationship values.The extraction of entity-relationships is of great significance to the construction of knowledge map and other deeper natural language processing problems such as domain ontology,question answering system and semantic understanding.The research of entity relationship extraction includes traditional entity-relationship extraction and open entity-relationship extraction.Traditional entity-relationship extraction is mainly for the limited domain text,the limited category entities and relationships.However,with the exponential growth characteristic and cross-domain characteristic of Internet information,open information extraction has become an important research field in information extraction.Its goal is to extract entities,relationships,events and other multi-level semantic unit information cross domain from massive,heterogeneous,non-standard with a lot of noise and redundancy Web pages and form format output.It makes this possible to process large amounts of network data across domains.The open entity-relation extraction for English text is divided into two stages: the first stage is to extraction entities firstly and the second stage is to extract relationships firstly.But the research for Chinese text is relatively less,and mainly for extracting binary relationships.Therefore,this paper proposes an open entity-relation extraction method for multi-relational relations based on dependency analysis,and realizes the system.In this paper,we propose a non-guided Chinese open information extraction method based on the dependency relation of large-scale network text.Firstly,the text is pre-processed using word segmentation,part-of-speech and dependency analysis.Secondly,rules are used to identify basic noun phrases and the relationship between them.Finally through the training of the classifier the system filter the candidate relationship group to get the final relationship group.This filtered entity-relation group is stored in the database for other natural language processing tasks.In this paper,we use the LTP platform for text preprocessing,and define a set of lexical combinations of basic noun phrases.We define a series of rules for extracting entity relations based on dependencies.In the filtering stage,we train a classifier with the features,such as the number of entities,the part-of-speech and the distance within words.The classifier can identify whether the candidate relationships are correct or not.In the 500 Baidu Encyclopedia of sentence extraction experiment,we get 81.25% accuracy.Finally,in order to present the results of this paper to researchers,a system is built to extract the entity-relation groups from the text entered by the user.
Keywords/Search Tags:open information extraction, entity-relation extraction, machine learning, logistic regression classifier, support vector machine
PDF Full Text Request
Related items