With the continuous development of society,the medical and health industry is also more and more concerned.However,the medical and health industry is facing major challenges such as rising medical expenditure and insufficient medical personnel.The continuous development of artificial intelligence technology provides ideas to solve these challenges.The construction of Chinese medical knowledge graph plays an important role in the development of medical systems such as automatic question answering system,and in promoting the development of medical and health industry.In this paper,we propose a general technical scheme of building Chinese medical knowledge graph based on multi-resources,and focus on extracting medical knowledge from unstructured data based on natural language processing technology for the construction and update of knowledge graph.As for the task of document level medical entity recognition,in order to deal with the problem that the length of document level samples is too long,a multi-level sentence division mechanism is proposed to transform the document level samples into sentence level samples.In this paper,a sequence labeling model based on knowledge fusion is designed.On the one hand,a variety of methods are explored to extract domain knowledge from medical domain dictionaries,and the pre-training language model BERT is used as general knowledge,then domain knowledge and general knowledge are integrated into the model by vector concatenation.On the other hand,CNN is used to extract the local context information of Chinese characters.The experimental results show that CNN can improve the context modeling ability of the model,and the integration of knowledge into the model can effectively improve the performance of Chinese medical entity recognition.For medical entity alignment task,this paper proposes a deep learning model which integrates traditional features.The model consists of two parts: traditional feature extractor and deep matching network.On the one hand,traditional feature extractors are designed based on text similarity,bag-of-words model and TFIDF;on the other hand,three deep matching networks are explored,including siamese network based on Bi LSTM,matching-aggregation network based on attention mechanism and matching model based on BERT.The experimental results show that the simpler the deepth matching network is,the greater the percentage point of improvement is.The best result can be obtained by the BERT matching model with traditional features.The best result can be obtained by combining the traditional feature with the matching model based on BERT.As for the task of document level medical entity relationship extraction,in order to deal with the cross sentence problem of entity relationship in document level Chinese medical text,this paper proposes a sentence extraction method based on the maximum interval filtering of entities.On the premise of alleviating the loss of cross sentence entity pairs,this method transforms document level samples into sentence level samples.For the task of relation extraction,in order to make sentence level samples suitable for deep learning model training,a series of sentence level sample processing methods are proposed.This paper explores a new sentence level relation extraction model.On the one hand,CNN is used to extract local context in the word representation layer to enhance the word representation;on the other hand,two Bi LSTM layers with residual connection is used in the encoding layer.Our method has achieved the best performance in a labeled Chinese dataset for diabetes.In addition,based on the above methods,this paper developed a knowledge graph system based on multi-resources,showing the construction,update and application of Chinese medical knowledge graph. |