With the continuous improvement of living standards in recent years,people’s awareness of health management has been further enhanced,and the concept of treating diseases with TCM has become increasingly popular.However,the complex and diverse concept of TCM health field makes it difficult for people to apply relevant theoretical knowledge to daily health risk prevention.It is particularly important to fully and effectively integrate massive TCM health knowledge to meet People’s Daily health risk prevention needs.Therefore,on the basis of combing the TCM health knowledge system,this paper use Knowledge Graph technology to realize the representation and storage of the entities such as meridians,diseases,symptoms,syndromes,meridians,acupoints and their relationships in the field of TCM health,and carries out the following research work.Firstly,under the guidance of TCM health knowledge system and on the basis of referring to the relevant research results of Knowledge Graph in the field of TCM health,the future application scenarios of Knowledge Graph were comprehensively considered to define various entities and their attributes and associations in the field of TCM health,so as to complete the construction of TCM health Knowledge Graph Schema layer.Secondly,in view of the current situation that there is no open source labeling corpus in the field of TCM health,the multi-source and heterogeneous TCM health data were preprocessed.the text annotation tool BRAT was selected,and a labeling corpus for entity recognition was constructed by combining the domain dictionary pre-labeling with multi-person and multi-batch labeling.From the perspective of grammatical expression and part of speech combination,the features of diseases and symptoms terms are analyzed and summarized,which lays a foundation for the establishment of entity recognition model adapted to the field of TCM health.Thirdly,the entity recognition effect of the classical BiLSTM-CRF model was tested based on the above annotated corpus.Based on the analysis of the reasons for the poor recognition effect of the model,combined with the characteristics of diseases and symptoms terms in the field of TCM health,this paper improves the structure of BiLSTMCRF model from the perspective of changing the generation mode of pre-training vector,extracting the characteristics of diseases and symptoms terms in the field and speeding up the training speed of the model,introduces the BERT pretraining model,POS features and BiGRU network,and proposes the BERT-POS-BiGRU-CRF entity recognition model,and verify the effect of BERT-POS-BiGRU-CRF entity recognition model through experiments.Finally,using the BERT-POS-BiGRU-CRF model to complete the domain entity recognition,through the establishment of rule template to achieve the specified relationship between entities,the entity alignment is carried out by using the method of alias and attribute relationship similarity calculation,and knowledge storage was completed with Neo4j graph database,so as to finally realize the construction of Knowledge Graph in the field of TCM health. |