Font Size: a A A

Research On Disease Diagnosis And Health Prediction Models Based On Medical Knowledge Graph

Posted on:2020-12-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:J C JiangFull Text:PDF
GTID:1360330614450801Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text-oriented knowledge acquisition,representation lies in the cross field of natural language processing and knowledge engineering,and is the basic research for simulating the intelligentability of human cognition,logic inference,and time-series prediction.In the study of statistical natural language processing,symbolic form based on human language is a common knowledge representation,such as 3-tuple knowledge consist of entity and relation,first-order or high-order logic knowledge with semantic information.This kind of knowledge form has the advantages of being intuitive and concise,easy to understand.Combined with the statistical learning model,the symbolic knowledge can be effectively applied to some mainstream language processing tasks.Another form of knowledge adopts distributed representation.The continuous,dense,low-dimensional vector is usually depict the original semantic information of knowledge and the relevance of knowledge,such as the entity embedding methods represented by Trans E,graph embedding methods represented by graph neural network.This kind of representational learning methods can obtain more abstract characteristic information through large-scale corpus,and is efficiently applied to most statistical machine learning algorithms.In recent years,the research of knowledge graph with the 3-tuple as structural units has achieved breakthrough in some tasks,such as intelligent search,question answering system and information recommendation.It provides the accurate,traceable and interpretable knowledge not only for statistical relational learning,but also can effectively support uncertainty reasoning based on noise data and multi-relational data.Considering the difficulty and the high cost of acquiring knowledge in open domain,the current knowledge graph construction and related research are mostly task-oriented.Restricting knowledge types and data sources to a specific domain can often serve various decision support systems with limited resources.Based on the medical field,this research studies the key technologies of knowledge acquisition and representation,probabilistic reasoning and time-series prediction for medical texts.The main research contents include the following five aspects:The first part is the construction of knowledge graph based on medical texts.Aiming at the problems of incomplete scheme and lack of corpus in the construction of medical knowledge graph,we develop a scheme for medical concepts and relations,which combines the characteristics of electronic medical records and clinical practice guidelines.Under the guidance of this scheme,we build a medical knowledge graph by manual annotation,in which entities and relations are considered as node and edges,respectively.Through analyzing and mining the structural characteristics of the knowledge graph,we verify that the complex knowledge system can provide data support for disease reasoning and health prediction research.The second part is the disease diagnosis model based on representation of medical knowledge.Consider that the symbolic knowledge with semantic information is difficult to be understood by machines.Its scope of application is limited to the statistical relational learning model based on logical reasoning,and cannot be naturally combined with the machine learning models based on features,which have stronger learning ability.In this study,a representation learning algorithm of medical knowledge based on recursive neural network is proposed.A Huffman tree with medical entities as input and logical knowledge as hidden neurons is adopted as the structure of neural network.The distributed representation of medical knowledge is trained in disease diagnosis tasks.The logical knowledge embeddings with deep semantic information and interpretability can be learned through layer-wise representation abstraction.The third part is the probabilistic inference model based on medical knowledge graph.The medical knowledge graph contains a large amount of medical empirical knowledge and common-sense knowledge,which plays a key role in the clinical decision-making,such as disease diagnosis and examination recommendation.The traditional probabilistic graphical models are difficult to characterize the severity of symptoms by the constraints of binary variables.More importantly,these models cannot handle the numerical results,which are common in physical examination.Therefore,this study extracts the "symptomdisease" subgraph and the "examination-disease" subgraph from the medical knowledge graph,respectively.We transform the potential function expression of Markov network by the energy function definition of the Boltzmann machine for the paired particles.Then,the expression enables multiple variables to participate directly in probability calculation.Finally,we explore the effects of discrete symptom variables and continuous examination variables on model performance.The fourth part is the parameter learning for medical knowledge networks based on maximum margin.The research of inference based on knowledge graph is crucial for related medical tasks,such as intelligent diagnosis,health recommendation and so on.Meanwhile,the rational confidence of medical knowledge is trained by learning models is also a way to improve inference performance.In this paper,a weight learning model of medical knowledge based on maximum margin is proposed for multivariate inference model in the third part.Combining the joint probability distribution of the inference engine and the task characteristics of the medical field,the weight learning problem is transformed into the geometric margin optimization problem of the weight vector.The collaborative update algorithm for dual Lagrangian multiplier is designed to solve the optimization problem.Finally,the effectiveness of the learning models based on maximum margin and the maximum likelihood in improving the performance of inference is compared.The fifth part is the health prediction model based on cascading failure theory.For predictive tasks,the training process of statistical machine learning often relies on a large amount of time-series data.In the medical field,however,the continuous tracking of patient signs is constrained by many aspects,such as technology and environment.It is difficult to generate time-series health data,which can support the learning of subsequent prediction models.In this paper,the internal mechanism of the complex system of human body is explored.The cascading failure theory is used to simulate the process of the body state being gradually deteriorated by disease.Through depicting the interaction of the local signs,we have achieved the autonomic evolution of human body system,and the purpose of health prediction.In general,this paper focuses on two major sources of medical text,and studies the key technologies of medical knowledge acquisition and representation,logical inference,and time-series prediction based.In the real electronic medical records,our proposed models can significantly improve the performance of different tasks.We expect theseresearch results to be extended to more types of datasets and tasks.Furthermore,these models can further advance the research and development of natural language processing technologies in the areas of disease diagnosis and health prediction.
Keywords/Search Tags:medical text, knowledge graph, knowledge representation, probabilistic graphical model, cascading failure
PDF Full Text Request
Related items