| The long-term development of scientific research and production practice in power has accumulated a large number of unstructured power text.The unstructured power text contains domain knowledge such as electric power-related research institutions,equipment and facilities,theoretical methods and key technologies.Entity recognition of power text enables automated extraction of knowledge in the field of power,which is of great importance for the development of the power industry.The existing entity recognition models are mainly trained by random sampling,ignoring the influence of sample presentation order on model training.The latest research shows that differences in content and annotation quality can lead to differences among samples.So,organizing the training process based on sample characteristics can help improve model training efficiency and recognition performance.Therefore,for the task of recognizing domain entities in power text,this thesis introduces curriculum learning to improve the training process of the model from different perspectives and achieve model optimization.The main research contents and contributions of this thesis are as follows:(1)To address the problem of slow training of entity recognition models based on BERT embedding,two sample difficulty evaluation criteria based on entity features are designed.The curriculum learning framework based on Natural Breaks is proposed and the entity recognition method is constructed by combining with the framework and the criteria.The method transforms the training process of the model from overall training into training on a series of "curricula",and guides the entity recognition model to learn samples from easy to difficult.The experimental results show that the method can improve the recognition effect and training efficiency of the model.(2)To address the label noise problem in power dataset,a label noise assessment method based on improved Cross Review is proposed.And we combine it with the above-mentioned curriculum learning framework based on Natural Breaks to construct the entity recognition method.The method guides the model to start learning from low-noise samples and gradually transition to high-noise samples.It can reduce the interference of noise in the early stage of training.The experimental results show that the method can effectively reduce the influence of label noise in the power dataset on the model,and improve the entity recognition effect.(3)Relying on the project of research on key technologies of power knowledge graph,this thesis briefly analyzes the practical application of power text entity recognition.Based on the entity recognition method based on curriculum learning,a total of 2,145 power technology project documents from the last 20 years of State Grid Hunan Company were extracted and a power knowledge graph of about 32,000 entity magnitude has been developed.It embodies good application value.There are 25 figures,16 tables,and 84 references in this thesis. |