Font Size: a A A

Key Technology Research On Knowledge Entity Recognition And Its Relation Extraction For Specific Domains Text

Posted on:2019-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:X Y HeFull Text:PDF
GTID:2348330542963932Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the emergence of a great deal of knowledge,knowledge graph is playing a more and more important role,among them,knowledge base is the key to build knowledge graph.However,these knowledge bases usually lack universal property for specific domain,and they are not satisfied with the construction of specific areas knowledge graphs.For a specific domain,as the scale of domain entities and relations among them is quite large,it's a time-consuming and laborious process to build a knowledge graph on manual statistics alone.Moreover,the source of domain-specific knowledge data is usually unstructured or semi-structured texts,which also make it more difficult to acquire domain knowledge.Therefore,how to build a domain knowledge graph automatically is a meaningful work.The acquisition of specific areas knowledge entities and its relationships is a prerequisite for building knowledge bases and knowledge graphs.This thesis mainly studies the knowledge entity recognition and entity relation extraction.The main research work is shown as follows:1)This thesis adopts conditional random field(CRF)model to extract the knowledge entities in specific domain.In the process of building CRF recognition model,the thesis imports lexical analysis features and syntactic analysis features.In order to improve the extraction efficiency,the thesis proposes the semantic dependency parsing feature,and the experimental results are shows the feasible of the approach.2)This thesis proposes a method of extracting hyponymy relations based on hybrid lexical features and hybrid syntax features.Through the analysis of the corpus,this thesis selects one kind of sentences as study object,and uses the hypernym-hyponym entity separation mechanism.Furthermore,it formulates the corresponding lexical rules base as well as syntax rules base to extract the concept of the hypernym-hyponym entity.3)In this thesis,an approach based on bootstrapping and word embedding is proposed to extract the relations of the domain entities automatically.The verb frame with subject predicate object relation is extracted as a seed template based on dependency parsing.Then bootstrapping algorithm is used to extract and extend the triples.Word vector similarity calculation is used to extract the concept.Finally,this thesis completes the extraction of entity relationship.This method is proved to be effective through these experiments.The accuracy rate of knowledge extraction based on conditional random field can reach 90%.The extraction method of hyponymy relations in the text of tourism domain can achieve the hyponymy entity extraction for a certain extent.Through the corpus training of word2 vec,the calculation of entity concept similarity is completed.At last,this thesis describes the exiting problem and the further research plans.
Keywords/Search Tags:Knowledge Graph, Entity Recognition, Entity Relation Extraction, Conditional Random Field, Bootstrapping
PDF Full Text Request
Related items