Font Size: a A A

Research And Implementation Of Entity Relation Extraction In Massive Chinese Internet Text

Posted on:2019-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WuFull Text:PDF
GTID:2348330545958410Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and mobile Internet,the information contained in the Internet is also growing explosively.The research of entity relation extraction has also entered a new stage and open entity relation extraction has emerged.Open entity relation extraction searches for words in the text to identify the relations between entities.It does not need to predefine the relationship type system and the relationship type is open,which is more suitable for massive open Internet text.Based on the previous research,this paper proposes two new open entity relation extraction methods.Furthermore,the clustering-based method is used to further abstract relation keywords in the extracted results,which implements the automatic construction of relationship types and describes the relationship between entities better.Meanwhile,because of the limited computing resources of the single machine,a parallel open entity relation extraction method is proposed based on the Hadoop distributed computing framework.By taking advantage of the computing power of the clusters,the ability to deal with massive Internet text is improved.Based on the above,the main work of this paper is as follows:1.This paper proposes a new method of Chinese open entity relation extraction C-COERE.C-COERE proposes a new relation tuple extraction algorithm based on the syntax parsing tree after labeling relation keywords by CRFs.The effect of extraction is improved by introducing syntactic information.At the same time,based on context information of relation tuples in raw corpus,a confidence model is constructed and post-processing filtering is carried out for relation tuples,which further improves the accuracy of extraction results,C-COERE enhances the recall of the extracted results by exploiting the duality of patterns and tuples.Aiming at the problem that single machine can not deal with massive data,this paper combines with Hadoop framework and proposes PC-COERE,which is the parallel algorithm of C-COERE.2.In this paper,a new Chinese open entity relation extraction method based on deep neural network is proposed.This paper models the task of open entity relation extraction as binary classification.By mapping the word in the text as word vector and extracting effective features by neural network,we avoid the errors of preprocessing tool and improve the extraction performance.This paper constructs the lexical feature,the location feature and the category feature by vector mapping,constructs the tuple level pattern feature by recurrent neural network LSTM and constructs the sentence level semantic feature by convolution neural network.These features are combined and the candidate relation tuples are classified by logical regression.Experiments on real dataset show that this method has achieved good effect.3.The automatic construction of relationship types is implemented based on word2vec and the clustering algorithms.And the experiment is designed to compare the effect of various clustering methods.It is found that the best results have been achieved by word2vec and hierarchical clustering.4.Finally,a visual Chinese open entity relation extraction prototype system is built to display the relationship between entities more intuitively.Users can enter the query entity on the front end.Then the prototype system returns the entities that is related to the query entity and displays to the user in the form of the entity relation diagram.
Keywords/Search Tags:open entity relation extraction, conditional random fields, word2vec, convolution neural network, recurrent neural network
PDF Full Text Request
Related items