The Enterprise Knowledge Graph proposed in this paper is a network that describes the relationship between enterprises for network data,and can also be called enterprise social graph.As a key step in the process of building a knowledge map,the enterprise entity relationship extraction task is an important part of building a corporate social graph.The discovery of inter-firm relationships from open data is of great significance for companies and consumers to understand and analyze the industry and make relevant decisions.However,because enterprise entity relationship has strong domain characteristics,the traditional entity relationship extraction model can't achieve a good extraction effect on the relationship extraction task of the enterprise entities in the network text.In this paper,an improved algorithm for entity relationship extraction based on remote supervision is proposed.Through the research and analysis of the traditional remote relationship-based entity relationship extraction algorithm,it is found that the traditional methods mainly have the following problems: First,the process of aligning the knowledge base is easy to generate noise data;second,the traditional classifier classification is not effective.In view of the above problems of the traditional remote-based entity relationship extraction algorithm based on the above analysis,this paper has done the following work:On the one hand,the problem of easily generating noise data for the alignment of the knowledge base process.This paper first analyzes the reasons why the traditional remote monitoring process is prone to generate noise data,and then proposes a denoising algorithm.By predefining the company entity relationship and relationship candidate words,the algorithm uses the fusion vocabulary semantic similarity calculation algorithm based on CNKI and the word forest to generate the entity relationship candidate word set,and then judges whether the entity has a set of vocabulary on the dependent path.In the candidate word set of the corresponding relationship,if it exists,the text is a positive example;if it does not exist,it is a negative example.Thereby,the process of denoising the training corpus generated after the knowledge base is aligned is realized.On the other hand,Goal at the problem of poor classification of traditional classifiers,this paper proposes a method of training classifiers by using the feature selection process and then using the Gradient Boosting gradient lifting method(hereinafter referred to as GBDT).Based on the relevant literature,this paper initializes the relevant features,then uses the random forest-based approach to screen the features,and then uses the random forest-based approach to screen the features to obtain high-quality features.The filtered classifier and gradient boost method GBDT are then used to train the strong classifier.The GBDT algorithm optimizes the loss of the previous step by calculating the negative gradient of the loss function by each iteration,and finally forms a strong classifier to complete the enterprise entity classification task.Based on this,this paper uses contrastive experiments to experiment with the ideas presented in this paper.The weak classifier SVM and Logistic regression model are compared with the GDBT classification model respectively.The final experimental results show that the strong classifier based on GDBT has higher Accuracy,recall and F value,the accuracy rate reached 85.50%,which proves the effectiveness of the proposed algorithm. |