As the most commonly used knowledge expression method,relational triad information is an important source for constructing a knowledge graph,and the relation extraction task is responsible for extracting relational triad information from unstructured text.Traditional relation extraction methods rely heavily on large-scale labeled training data,and labeling training data requires a lot of manpower and material resources.The distant supervision method automatically generates large-scale labeled training data by aligning unstructured text with the knowledge base.Reduce the cost of data set annotation.Although the distant supervision method solves the problem of labeling training data,it inevitably brings noisy data due to its strong assumptions,and the labelled training data has a long-tailed distribution problem.In order to reduce the influence of noisy data and long-tail distribution,and improve the accuracy of the distant supervision relationship extraction model,this paper proposes a new distant supervision relationship extraction method: in terms of sentence representation,the pre-training model is used to obtain the word vector,and the word The vector and the position vector are connected in series to form a richer semantic vector;in terms of feature extraction,the bidirectional gated recurrent unit model and the graph convolutional neural network model are used to learn the feature information of the training data;in terms of reducing noise problems,the word level and sentence level attention mechanism enables the model to focus on effective training samples,and less or not focus on invalid samples;in addition,in order to enrich data information and alleviate the problem of long-tail distribution of training data,use entity background information,entity type information,and relationship aliases.The information-assisted relationship extraction task and the use of the Focal Loss loss function make the relationship extraction model focus on samples with a small amount of data,making the model training more adequate.Finally,Riedel and GIDS open source data sets are used to verify the effectiveness of the model.Experimental results show that the designed distant supervision relation extraction model has significantly improved accuracy and recall rate compared with the previous distant supervision relation extraction baseline model,and the AUC index on the Riedel data set is as high as 0.41.In order to better demonstrate the distant supervision relationship extraction process,a distant supervision relationship extraction demonstration system was built using the Flask network framework. |