| With the rapid development of the Internet,the amount of information on the network increases exponentially,and there is a lot of knowledge among the entities in the text.The relation extraction task focuses on analyzing the relationship between different entity pairs from a large number of texts,so as to help scholars in various fields to infer and analyze the relationship.Due to the problem of insufficient data in relation extraction task,researchers often use distant supervised method to automatically label data.In the task of distant supervised relation extraction,there are complex association rules between entities.Modeling the syntactic relationship in the text can help the model better understand the semantics of the text.In addition,noise data generated in automatic annotation will also affect the results of the model.How to efficiently model syntactic structure and denoise in distant supervised relation extraction task is an important research direction in the field of distant supervised relation extraction.This paper analyzes the deficiency of existing distant supervised relation extraction methods in semantic understanding and noise elimination,and proposes to improve the effect of distant supervised relation extraction through implicit syntactic structure dependency parsing and relationship perception model based on metric learning.Firstly,considering the deficiency of introducing syntactic structure information through traditional methods,this paper uses neural network to generate relative syntactic distance to model the dependence between texts.The gating mechanism in LSTM network is modified,so that the model can implicitly generate the syntactic structure in the text without introducing any prior information,which helps the model better understand the semantics of the text.Experiments verify the effectiveness of the model.Secondly,this thesis analyzes the negative impact of noise instances on the effect of the model in the distant supervision relation extraction method,and proposes a relationship aware noise instance correction model based on metric learning.The model extracts the noise information from the data extracted from the distant supervised relation extraction,and uses metric learning to distinguish the noise information from the effective semantic information,Learn the representation of spatial difference distribution of different information.Reduce the contribution of noisy instances to specific relationships,improve the contribution of highly correlated instances to relationships,and improve the noise deviation problem in the existing distant supervised relation extraction model.Finally,this thesis adopts a two-stage training method.In the first stage,the word vector is generated by pre-training the implicit syntactic structure dependence model.In the second stage,the word vector obtained by pre-training is used to initialize the word vector in the distant supervised relation extraction model,and then the distant supervised relation extraction model is trained.The two-stage training method improves the effect of the model from the perspectives of semantics and noise,and verifies the effectiveness of the model through a series of experiments. |