| At present,China’s high-speed railway(HSR)has a strong momentum of development,HSR intelligent operation and maintenance data growth rate is extremely fast and contains valuable knowledge.These data can effectively assist professional and technical personnel to repair and troubleshoot HSR equipment and improve work efficiency,which is of great significance to ensure safe and efficient operation of HSR.However,most of HSR intelligent operation and maintenance information is stored in the text in an unstructured form,so there is a serious problem that knowledge cannot be reused efficiently.Automatic extraction of HSR intelligent operation and maintenance knowledge has become a hot research topic.Named entity recognition and relation extraction are the main tasks of knowledge extraction.The current named entity recognition models have poor recognition effect on nested entities in the text of HSR intelligent operation and maintenance domain.Current relation extraction models cannot fully utilize entity information,which results in limited ability of representation of entity relation and affects model performance.In addition,the current named entity recognition and relation extraction models are not domain-specific.To solve these problems,this paper proposes a named entity recognition model based on spans and a relation extraction model with the introduction of additional entity type markers to automate knowledge extraction from texts in the domain of HSR intelligent operation and maintenance,and preliminarily constructs a knowledge graph in the domain of HSR intelligent operation and maintenance.The specific research contents are as follows:(1)The data set of intelligent operation and maintenance domain of HSR is preliminarily constructed by manual annotation.The dataset consisted of 2000 annotated data,which were divided into training set,validation set and test set in the ratio of 6:2:2.(2)For named entity recognition task,this paper proposes a span-based named entity recognition model BERT-Span-Attention-NER,which introduces the entity span into the domain of HSR intelligent operation and maintenance,and correspondingly designs an entity span annotation scheme to transform the named entity recognition task into the span classification task.In order to improve the information utilization capability of the model,a span-level self-attention mechanism is proposed to make spans understand each other more deeply.In this paper,the named entity recognition model uses BERT as the encoder to improve the pertinence in the domain of HSR intelligent operation and maintenance.Experiments are carried out on public data sets and domain data sets,and the model is proved to be advanced and domain applicable.(3)For the relation extraction task,this paper proposes a relation extraction model AETM-RE(Additional Entity Type Marker Relation Extraction)that introduces additional entity type markers to introduce entity type information,so that the model can fully utilize entity information,enrich the feature representation of entity relations,improve model performance,and improve the domain-specificity of the model.In this paper,we also introduce the character dictionary feature,which is combined with the token representation to improve the performance of relation extraction.Experiments are carried out on public data sets and domain data sets to prove the effectiveness and domain applicability of the model.(4)Design and implement the knowledge graph construction tool of HSR intelligent operation and maintenance domain.The proposed knowledge extraction model is integrated as a knowledge graph construction tool to realize automatic extraction of domain text knowledge and preliminarily build HSR intelligent operation and maintenance domain knowledge graph.The domain knowledge graph has obvious advantages over the general knowledge graph in terms of professionalism,detail and domain specificity,which lays a foundation for the application of intelligent question answering in the HSR intelligent operation and maintenance domain. |