| With the continuous improvement of the degree of system informatization,government agencies have accumulated a large amount of data in their daily work.On the one hand,these valuable government data resources can help improve the work efficiency of various agencies and promote social and economic development.On the other hand,due to the professionalism and complexity of government data,it becomes difficult to manually extract the massive information contained in the data.In recent years,the rapid development of natural language processing technology has laid a solid foundation for the automatic extraction of information and the efficient use of data.However,government agencies have higher requirements for data accuracy.Existing algorithms fail to make full use of the text features in this field,so they can’t well meet the requirements for constructing government knowledge graphs.In order to make up for the shortcomings of the existing schemes,this thesis proposes new algorithms for the two subtasks of government named entity recognition and government entity relationship extraction.In the task of named entity recognition,the boundaries and categories of entities need to be detected.Because the words that constitute government entities have obvious characteristics of official documents,this thesis proposes an entity recognition method based on a pre-training classification mechanism.Use the data annotation tool to divide the training set into an entity part and a non-entity part.In the two parts,the word segmentation operation is carried out separately and the word frequency is counted.A certain proportion of high-frequency words are taken from each to construct a positive and negative dictionary to pre-train the classifier.Then the classifier is used to generate confidence parameters for all words in the sentence to characterize the possibility of forming a government entity.Because of the cross-correlation between the input data,this thesis uses the bidirectional long short-term memory(Bi LSTM)network to extract contextual information.At the same time,considering the different contributions of different features to the output results,the attention mechanism is introduced to assign appropriate weight parameters to different features.Finally,the conditional random field model(CRF)is used to learn the constraint relationship between tags.This method makes full use of the characteristics of the words constituting the government affairs entity,and improves the prediction accuracy of the government affairs named entity recognition.In the task of extracting entity relationships,the type of relationship between entity pairs need to be detected.Because there are a large number of entity pairs in a parallel structure in government affairs texts,and the entity pairs are concentrated,the relationship is the same,and the span is different.Therefore,making full use of local features has become an important way to improve the effect of government entity relationship extraction.However,the existing methods will lose the location information of local features when extracting text features through convolution operation and pooling layer.Therefore,this thesis proposes a feature sequence segmentation convolutional neural network(FSSCNN)to achieve adaptive pooling,which makes up for the shortcomings of the existing solutions.In order to extract the local features that have a strong influence on the prediction results in different spans,this thesis adopts the saliency feature extraction convolutional neural network(SFECNN)to achieve local feature enhancement.At the same time,this thesis uses the Transformer structure to extract long-distance dependencies.After fusing the local features with the global features,the relationship category is predicted by the Softmax function.This method retains the location information in local features and global features,which helps to improve the prediction effect of relation extraction tasks. |