Font Size: a A A

Research And Implementation Of Chinese Open Entity Relation Extraction

Posted on:2017-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2308330485486489Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the increasing of information on the Internet, traditional search method is difficult to satisfy the needs of users who want to master the information and knowledge resources quickly and fully. As a vital part of information extraction, entity relation extraction strives to extract relations and attributes of entities from the semi-structured and unstructured natural language automatically. It helps users to understand the increasing network information more efficiently and provides more intelligent information retrieval services for users.Traditional information extraction is often limited by only extracting predefined relation types so the portability of the method is poor and the method is difficult to deal with large-scale web data. In order to extract relation tuples from web data, this thesis proposes an open relation extraction method DPM. First, the method automatically obtains the training corpus that consists of large amounts of high quality relation tuples and their corresponding sentences. Then it learns numerous relation patterns that encode with dependency parsing roles and parts of speech from large corpus. Second, in order to improve the quality of pre-processing, this thesis deals with trans-classed words by the dependency parsing statistical laws. Third, we use learned patterns in the first step to extract candidate relation tuples. Finally, we evaluate the quality of relation tuples by using logistic regression to obtain high quality relation tuples. We use open dataset Wiki-500、Sina-500、Tecent-500 as well as Simple-500 to prove the validity of DPM.The P-R curve of DPM is almost in the upper right of related work’s. The experiments show the effectiveness of the DPM algorithm, in other words our study demonstrated that DPM performs well on recall when the accuracy is at the same level; and that When the recall is at the same level, DPM performs well on accuracy.In addition, we proposed a relation extraction system which can not only extract hyponymy relations and attribute relations from Baidu Encyclopedia but also can use DPM method to extract relation tuples from web data to population relation knowledge base.
Keywords/Search Tags:information extraction, relation extraction, entity extraction, knowledge base
PDF Full Text Request
Related items