Font Size: a A A

Research On Chinese Medical Ontology Reconstruction Based On Real-world Medical Big Data

Posted on:2022-09-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:L M ChenFull Text:PDF
GTID:1484306350996829Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
High quality medical ontology could break the barriers between various sources of medical text data and help us to utilize medical text data by various information processing methods under standardization control,including construction of medical knowledge representation system and other clinical applications.Compared with developed countries,especially English speaking countries,there is a large amount of the development of biomedical ontology in China lagging behind.Therefore,it is necessary to setup technology roadmap and construction strategy for China.Which,could help us to integrate clinical data and information technology.In our research work,we will systematically investigate and analyze the construction strategies and technology roadmap of biomedical ontology construction in the world,and design ontology construction strategies for Chinese based on characteristic of China,which includes problem definition,terminology enrichment,and so on,There are seven key steps,including,relation attribute extraction.On this basis,we define the reconstruction of Chinese biomedical ontology as three different levels:the reconstruction of concerns,content and structure of Chinese biomedical ontology.Content reconstruction is the basis of medical ontology construction,so this paper focuses on the content reconstruction of biomedical ontology.Firstly,we try to reconstruct Chinese Biomedical entities with data mining for content.In addition,as the further enrichment and expansion of ontology content,attributes provide a fine-grained information representation strategy for the application of ontology in real scenes.Therefore,we propose a fine-grained semantic information model called PhenoSSU(Semantic Structure Unit of Phenotype)based on Ontology attributes.In part of data mining for Chinese biomedical entities,firstly,we construct a biomedical entity recognition method based supervised method with Bi-LSTM model.However,there are some disadvantages with supervised method,such as the requirement of high-quality annotation and context information.Furthermore,we develop an unsupervised biomedical entity recognition method based on n-gram language model and lexical analysis.At last,to improve scope of terminology collection,such as rare terms and unstandardized terms,we integrate MetaMap,Baidu Translation engine and SimAlign together to achieve the goal which could help us to annotate biomedical entities with multilingual materials.We use combination strategy for biomedical entities mining to achieve a better performance compare with standard terminology list,which gets higher coverage of biomedical information in real sense,from 58.2%to 80.2%,increased by 37.8%.PhenoSSU model aims to capture the full semantic information underlying phenotype descriptions with a series of attributes and values.A total of 193 clinical guidelines for infectious diseases from Wikipedia were selected as the study corpus,and 12 attributes from SNOMED-CT were introduced into the PhenoSSU model based on the cooccurrences of phenotype concepts and attribute values.The expressive power of the PhenoSSU model was evaluated by analyzing whether PhenoSSU instances could capture the full semantics underlying the descriptions of the corresponding phenotypes.To automatically construct fine-grained phenotype knowledge graphs,a hybrid strategy that first recognized phenotype concepts with the MetaMap tool and then predicted the attribute values of phenotypes with machine learning classifiers was developed.Fine-grained phenotype knowledge graphs of 193 infectious diseases were manually constructed with the BRAT annotation tool.A total of 4020 PhenoSSU instances were annotated in these knowledge graphs,and 3757 of them(89.5%)were found to be able to capture the full semantics underlying the descriptions of the corresponding phenotypes listed in clinical guidelines.By comparison,other information models,such as the clinical element model and the HL7 FHIR model,could only capture the full semantics underlying 48.4%(2034/4020)and 21.8%(914/4020)of the descriptions of phenotypes listed in clinical guidelines,respectively.The hybrid strategy achieved an F1-score of 0.732 for the subtask of phenotype concept recognition and an average weighted accuracy of 0.776 for the subtask of attribute value prediction.
Keywords/Search Tags:Biological Medical Ontology, Chinese Medical Ontology, Natural Language Processing, Deep Learning
PDF Full Text Request
Related items