Font Size: a A A

Research And Application Of Key Technologies For Knowledge Graph In Traditional Chinese Medicine

Posted on:2020-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:J G LuoFull Text:PDF
GTID:2404330590997518Subject:Chinese medicine informatics
Abstract/Summary:PDF Full Text Request
After thousands of years of inheritance and development,Chinese medicine has its own unique theoretical system and clinical practice guiding significance.Chinese medicine researchers and enthusiasts focus on finding evidence and guidance from the literature experience accumulated by Chinese medicine practitioners.After thousands of years of accumulation,the Chinese medicine field has produced a large amount of text data,which is rich in semantic information and complex in relationship.In response to the national development strategy requirements of the Chinese medicine,it is in line with the industrial development model of “Internet + Chinese Medicine” to promote the modernization of Chinese medicine.This paper focuses on the establishment of TCM knowledge graph construction and intelligent question and answer model,and proposes a TCM intelligent question and answer model based on knowledge graph.The main work of this paper is as follows:(1)In order to solve the adverse effect of Chinese word segmentation on entity recognition,this paper proposes a Chinese-language based on the Conditional Random Field(CRF)Bidirectional Long Short-Term Memory Network(BLSTM)based on word vector.Named entity recognition model(BLSTM-CRF).In this part,through the collation of TCM books,TCM Syndrome Differentiation Diagnosis,TCM 150 Syndrome Differentiation and Treatment,the TCM entity extracts the corpus;the word vector is used as the input of the bidirectional long-term and shortterm memory network,and the sentence features are extracted by the bidirectional LSTM;Access CRF tag inference to solve the dependency problem between output tags.A comparison experiment was carried out on the TCM entity corpus with a variety of algorithms.The results show that the BLSTM-CRF model based on word vector is superior to other algorithms,and the LSTM neural network parameters most suitable for TCM entity recognition are found through experiments.(2)To solve the problem that Softmax as the LSTM classifier leads to the lack of generalization ability of the entity relationship recognition model,this paper proposes a bidirectional LSTM model based on the Gradient Boosting Decision Tree(GBDT)algorithm.While using two-way LSTM for feature extraction,the Attention mechanism is used to capture the understanding of the input words by keyword words,and solve the problem that the model is easily interfered by irrelevant words.After feature extraction,GBDT is used to predict the relationship classification training.Because the basic model of GBDT has the advantages of low variance and high deviation,the integrated model is more stable.By comparing the TCM corpus and other two public domain corpus experiments,it is proved that the improved model proposed in this paper has obvious improvement in accuracy,recall rate and F value.It is a relation extraction model suitable for TCM specific fields.(3)To better represent the relationship between TCM entities and entities,this paper forms the schema layer structure of knowledge graph by sorting out the extracted entities and relationships.At the same time of constructing the knowledge graph,the TF-IDF algorithm is used to calculate the contribution weights between the syndrome-symptoms,syndrome-tongue,syndrome-cycle-like relationship,which facilitates the follow-up of Chinese medicine dialect;then the six types of entities The five types of relationships and the calculated weights are imported into the graph database to complete the knowledge graph construction.The knowledge graph is formed with a total number of nodes of 17,618,and the total number of relationships is 83,335.(4)In order to quickly acquire the knowledge of Chinese medicine and promote the culture of Chinese medicine,this paper constructs a TCM intelligent question and answer model based on knowledge graph.This part firstly performs the entity identification and word segmentation on the problem,and then abstracts the problem.Then it proposes a question point recognition based on GBDT algorithm,and constructs the dialectical model and the treatment model in the field of Chinese medicine.The intelligent question and answer in this paper the model adapts to simple and complex problems.Based on the key technologies proposed above,the PYTHON programming language and corresponding development tools were used to design and develop a TCM intelligent question answering system based on knowledge graph.
Keywords/Search Tags:Named entity recognition, relationship extraction, knowledge graph, intelligent question and answer, semantic analysis, Chinese medicine information
PDF Full Text Request
Related items