Font Size: a A A

Research On Entity Relation Extraction In Network Encyclopedia

Posted on:2018-05-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:C S RuFull Text:PDF
GTID:1368330623450475Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the Web 2.0 era,the multi-person collaboration model advocated by the network encyclopedia has fully utilized the collective wisdom of Internet users and promoted the rapid development of network encyclopedia.In recent years,researchers have realized that the network encyclopedia contains a wealth of human knowledge,which can be applied to question and answer system,search engine and other intelligent information service applications.Therefore,how to research and make use of the open relation extraction technology and extract the machine-readable and comprehensible structured knowledge from the network encyclopedia has become a hot research topic.However,open relation extraction is usually faced with problems,such as the poor generalization ability of relation discovery,inaccurate description of relation semantics,wrong labels in distant supervision,the design difficulties of relation classification features and so on.To solve these problems,this article has carried out deep research and the specific research works in this paper are as follows.Open relation discovery.The existing open relation discovery systems can only extract those relations whose syntactic representation exactly matches the explicit patterns.However,in the network encyclopedia,there are many kinds of relations and various manifestations,and the limited patterns are not enough to cover various types of relations.In order to extract more general grammatical features from the original data to express the relation and improve the generalization ability of the relation discovery,a multi-layer convolutional neural network(CNN)is adopted in this paper,and the dependency sequence on the expanded dependency path is taken as input to learn abstract features to express the relations.Experimental results on the Wikipedia dataset show that grammatical features learned from syntax-dependent sequences using convolutional neural networks are effective for open relation discovery,whether the relations are presented in a known or new syntactic pattern.Open relation annotation.The network encyclopedia contains many types of relations,which is unrealistic to know the types of relations in advance.In this paper,the clustering algorithm is used to solve the identification problem of the same type of relation instances when the relation types are undetermined.In order to solve the problem of irrelevant word sequence interference in the existing methods,a relation clustering method based on core dependency phrase is proposed.Therefore,in order to avoid the influence of irrelevant dependency phrases,we first design the heuristic rules to select the core dependency phrases to capture the semantics of the relations between entities more accurately,then cluster the relation instances according to the semantic similarity of the core dependency phrases,and label the cluster according to thesemantic distance between core dependency phrases located in the same cluster.Experimental results show that our method can describe the relation between entities more accurately,cluster better instances with similar relations,and generate reasonable labels for relation clustering.The reduce of wrong labels in distant supervision.Due to the various types of relations contained in the encyclopedia,it is obviously unrealistic to manually annotate the data required for the study of encyclopedic relation classification.In order to solve the problem of data annotation,the existing methods have proposed the distant supervision method to the relation classification of encyclopedia.However,there are usually a lot of wrong labels in the distant supervision methods,which seriously affect the relation classification performance.To improve the relation classification performance,we should reduce wrong labels firstly.Knowledge bases use relation phrases to describe various types of relations,while the relations between entities are described by dependency phrases.Based on this,this chapter proposes a method of using semantic Jaccard to measure the semantic similarity between the relation phrase and the dependency phrase to reduce wrong labels.Experimental results show that using semantic similarity to reduce wrong labels is effective and greatly improves the effectiveness of the relation classification.Relation classification based on convolutional neural network.In order to solve the problem that existing neural network models are often disturbed by irrelevant word sequences and the need to set up context windows,the core dependency phrase is proposed as the input for convolutional neural network in relation classification.In the process of reducing wrong labels,the semantic Jaccard chooses core dependency phrases to represent candidate relations in sentences that capture the features needed by the relation classification and eliminate the interference from irrelevant word sequences while avoiding the problem of setting the context window size.Experimental results show that using core dependency phrase as the input of convolutional neural network can effectively eliminate the interference of irrelevant information and improve the performance of relation classification.To sum up,in order to improve the generalization ability of relation discovery,accurately describe the relation semantics,reduce the wrong labels of training data and solve the problem of designing relation classification features,this paper proposes a series of new methods including the relation discovery based on convolutional neural network,relation annotation based on the core dependency phrase,wrong label reduce in distant supervision based on semantic similarity and relation classification based on the convolution neural network to improve the performance of relation extraction for the network encyclopedia.
Keywords/Search Tags:the network encyclopedia, open relation extraction, relation discovery, relation annotation, relation classification, convolutional neural network, word embedding, semantic similarity
PDF Full Text Request
Related items