Font Size: a A A

Research On Entity Alignment In The Field Of Genetic Diseares

Posted on:2021-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:J CaiFull Text:PDF
GTID:2404330605474590Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Nowadays,with the rapid development of artificial intelligence technology,all walks of life already have huge amounts of data.How to use these massive amounts of data reasonably is a problem that people need to solve urgently.Medicine and big data are closely connected,and the influence of the medical field on people is also increasing.Genetic disease is an important topic that people pay attention to.Many databases such as OMIM,Orphanet,Disease Ontology include a lot of knowledge of genetic diseases,but different databases may have data inconsistencies.If data from multiple websites can be integrated to form a comprehensive and professional genetic disease data network,doctors,researchers,and patients will get a more convenient way to obtain resources.The most critical step in fusing these website data is the entity alignment of knowledge.At present,the research objects of entity alignment are mainly all kinds of encyclopedias and public data sets.In the medical field,there are relatively few researches on entity alignment,and the research is a relatively macro knowledge base.There are few researches on entity alignment in the field of genetic diseases.Based on this,the paper attempts to apply the entity alignment method to the database in the field of genetic diseases.This paper studies the alignment of two entities in three genetic domain databases(OMIM,Orphanet,Disease Ontology).Web page information is grabbed from the official websites of these three databases and stored in the specified format.Then the data is preprocessed such as data cleaning,word form restoration,English word segmentation,removal of stop words and special characters.Using a combination of ICD10 coding connection and manual labeling,a total of 15,296 pieces of data were labeled in the three databases.This paper analyzes entity alignment from two aspects:entity alignment algorithm based on network semantic tags and machine learning algorithm.All calculations and analysis in this paper are based on Python.The entity alignment algorithm based on network semantic tags firstly calculates the disease name similarity to generate candidate entity pairs.Then it uses the candidate entity pairs to calculate the multi-label comprehensive similarity,and judges the entity alignment based on it.The result shows that the multi-label comprehensive similarity has higher accuracy than the name similarity and disease description similarity judgment,but the accuracy and recall rate are not high.This paper combines unbalanced data processing with machine learning to explore entity alignment.It considers the classification problem of entity alignment from the three perspectives:single classification level,data level,and algorithm level.It compares the classification effects of each model and finds the best model.Finally it draws that the two-layer classifier based on stacking works best on the test set.
Keywords/Search Tags:entity alignment, genetic diseases, similarity, unbalanced data processing, machine learning
PDF Full Text Request
Related items