| The level of higher education development to a large extent represents a country’s ability to cultivate high-level talents in the new era.In recent years,the Party and the state have attached great importance to the development of my country’s higher education,and higher education policies,higher education research and achievements have been fruitful.Explore a set of information extraction algorithms suitable for the field of higher education,extract entities that users care about and the relationships between entities from a large number of higher education policies and research results,provide data support for researchers in the field,and help higher education workers and Researchers grasp the direction,implement the party’s educational policy,and carry out higher education reform.Taking higher education policy documents as the object,this paper proposes a set of information extraction algorithms and builds a knowledge map of higher education policy documents based on it.The main contents of this article include:(1)In view of the problem that there is no publicly available named entity tagging corpus and relational corpus in the field of higher education,under the guidance of domain experts,a semi-automatic grouping and iterative tagging strategy is used to construct a named entity corpus and a named entity corpus for higher education policy documents respectively.The relational corpus is used to verify the availability and credibility of the constructed corpus through experiments.(2)Aiming at the problem that the length of entities in the named entity annotation corpus of higher education policy documents is too long,a word vector-based named entity recognition algorithm based on the corpus is proposed,which is applied to the entity recognition task in the field of higher education.middle.Based on the idea of machine learning Boosting algorithm,the algorithm decomposes the long text entity recognition task with high recognition difficulty and low recognition accuracy into multiple tasks and builds a hierarchical model for joint realization.The low-level model performs a coarse-grained named entity recognition task on the text,and the result is input into the high-level model for correction and supplementation.Through the cooperation between the low-level and high-level models,the entity recognition accuracy of long text entities is effectively improved.The experimental results show that the proposed cascading model named entity recognition algorithm has achieved better results in the recognition rate index than other entity recognition models,and its F1 value is 80.20%.(3)Based on entities extracted from higher education policy documents,this paper proposes an entity relation extraction method that combines rules,pre-training models and attention mechanisms.This method combines the feature extraction network BILSTM with the pre-training model BERT,which improves the model’s ability to extract text semantics and enables the model to focus on entities and relationships in sentences through an attention mechanism.The experimental results show that the extraction accuracy of the entity relationship extraction method proposed in this paper is better than other mainstream relationship extraction models,and its F1 value reaches 76.88%,which proves the feasibility of the relationship extraction model in the task of higher education policy document relationship extraction..(4)This paper uses Cypher language to store the extracted entities and relationships in the Neo4J graph database,constructs a knowledge graph for policy documents in the field of higher education and displays it,providing data support for higher education researchers. |