Font Size: a A A

Research On Unrestricted Type Entity Relation Extraction

Posted on:2019-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:S X WangFull Text:PDF
GTID:2428330545477170Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The information that exists on the Internet is not only from various fields but also rich in content and huge in quantity,but it is unstructured.How to extract entity information and entity relationship information from large amount of unstructured Internet information is the research hotspot in the field of information extraction.It has very important scientific research significance and practical application value.The traditional relationship extraction work is not only based on the small scale tagged corpus,but also relies on the artificial relationship type system in a certain field.It can not automatically adapt to the growing relationship types in the open domain of the Internet.Therefore,the relational extraction method based on predefined relationship type is not suitable for open domain.In this paper,the research on unrestricted type entity relation extraction explores a semi-automated framework to extract named entities and entity relationships from unannotated corpora in the open domain of the Internet.It has the characteristics that the relationship type is not limited and the demand for manual marking is low.In this paper,we first use the model that mixed PMI,left and right information entropy model to identify named entities without tagging corpus,which can identify a large number of entities that can not be identified by traditional naming entity recognition tools.Then we extract general nouns and general verbs(called feature words)from the corpus to realize the autonomous discovery of relationship types.The experimental results show that when the key threshold is 0.383,the clustering effect of based on word vector cosine is similarity as that of based on synonym forest.Then we propose a relational seed set extraction algorithm based on SimHash,which can extract relational seed set from association corpus by search engine.The average accuracy of the experiment was 90.47%in the nine types of human relationship.Then this paper generalizes the relational description pattern from the context in which the relationship seed is located and extracts the relational instance from the corpus using the description pattern.Then the relation instance is integrated into the relationship seed set to start the work of iterative relation description pattern mining and relational instance extraction.After the three of iterative,the average accuracy reached 95.98%in the nine types of human relations,meet the practical standards.Finally,this paper designs and implements the visualization system of relational instances,and presents the network composed of relational instances in the way of intuitive,clear and interactive force guiding diagram.The whole process of this study only requires less manual intervention,the running cost is small,and the field of transplantation is strong,which has a high practical value.
Keywords/Search Tags:Relational Extraction, Unrestricted Type, Named Entity Recognition, Relationship Type Autonomous Discovery, Relational Description Pattern Mining, Relational Instance Extraction
PDF Full Text Request
Related items