Font Size: a A A

Research On Chinese Open Entity Relation Extraction

Posted on:2016-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:M Y WangFull Text:PDF
GTID:2298330467493108Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The rapid development of the World Wide Web brings us the problem of information explosion, how to quickly extract valuable information has become the primary task, open information extraction technologies arises in this case. Open Information Extraction (IE) systems extract relational tuples (Entity1, Relation, Entity2) from text, without requiring a pre-specified vocabulary, by identifying relation phrases and associated arguments in arbitrary sentences. A lot of work have been done for English Open IE, and there have been many of the prototype system, such as TextRunner, WOE, Re Verb and OLLIE. And now the Chinese Open IE develops gradually, this field is attracting more and more researchers and scholars.In this paper we present a SCOERE (Semi-supervised Chinese Open Entity Relation Extraction) method to solve the problem of extracting relation tuples from Web text. This approach combines the advantages of both unsupervised and supervised methods, which needs very little human work and would iteratively extract tuples until there is no new relation keywords generated. And it combines the high accuracy rate of supervised methods and high recall rate of unsupervised methods effectively. SCOERE use the CRF (Conditional Random Field) as the supervised method and use the Bootstrap frame as the unsupervised method. We do experiments on corpus from the network news page, and achieved73.2%F value, especially in the recall rate improves by107%than the algorithm not using Bootstrap framework, demonstrating the effectiveness of this method and portability.This work was supported by the National Natural Science Foundation of China," hLDA based Chinese multi-document summarization "(project approval number:61202247) and "On the management of uncertainties in Web2.0user generated content"(project approval number:71231002).
Keywords/Search Tags:open information extract, conditional random field, semi-supervised, chinese entity relation, bootstrap
PDF Full Text Request
Related items