Font Size: a A A

Research On Relationship Extraction Of Chinese Company Entities Based On Web Data

Posted on:2019-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:L MengFull Text:PDF
GTID:2348330542991109Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As a research branch drawn from the entity relationship extraction,company entity relationship extraction is an important part of the information extraction task.Extracting company relationships in open data is significant in understanding and analyzing industries,making management decisions and choosing business partners,etc.The traditional method of entity relationship extraction has some problems such as coarse classification and insufficient details.However,in the Chinese context,problems such as complicated grammar and flexible composition of words are faced in the application of Chinese company entity relationship extraction.It is less effective to apply traditional methods in this area directly.Therefore,this paper mainly studies Chinese company entity recognition and Chinese company relationship extraction.The content of this paper contains two parts.Firstly,in this paper,the method of Chinese entity recognition is studied.The identification of Chinese company names and acronyms is an important and challenging task for entity identification in Natural Language Processing(NLP).The traditional method of company name identification has the following problems.Not only it is difficult to identify the names of companies that are not yet registered,but it is also poor effective to identify the abbreviation names,as well as it is difficult to construct the training corpus.Based on these problems,this paper presents a learning algorithm based on rules and dictionary matching and statistics fusion algorithm(SF-UNION).Using the company name as the standard corpus,organic combination improves the performance of Chinese company name and short name recognition.In the open test,this method has achieved good results of the recall rate,accuracy,F value,respectively in the company name identification and the abbreviation recognition.Secondly,the method of extracting Chinese company entity relationship is studied in this paper.The extraction of Chinese company entity relationship is an important part of entity relationship extraction task.However,due to many problems in the process of Chinese company entity relationship extraction,direct application of traditional methods is less effective in Practical application.In order to solve the above problems,this paper proposes a DP_ATT_LSTM algorithm based on Attention mechanism(ATT)and Long Short-Term Memory(LSTM)network fusion based on Dependency Parsing(DP).According to the Dependency Parsing of the input sentence,the sequence of predicate verbs judged by the dependency arc is obtained according to the characteristics of the entity relations and input to a LSTM network.While inputting a single sentence into the input layer of another LSTM,the corresponding predicate verb information is incorporated as a priori knowledge to adaptively calculate the weight of attention used to generate the sentence representation.Then,the calculated eigenvectors are input to the classifier to classify the entity relations.Experiments show that the algorithm proposed in this paper gets better results.The accuracy,recall rate and F1 value of the algorithm achieved better performance respectively.
Keywords/Search Tags:Relationship extraction, Chinese company entity recognition, Chinese company relationship extraction, LSTM, dependency parsing
PDF Full Text Request
Related items