Font Size: a A A

Research On Text Mining Technologies About Interpreting The Relationship Between Diseases And Genetic Variations

Posted on:2019-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y T XingFull Text:PDF
GTID:2404330611493645Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Studying the association between diseases and variations is important for systematic understanding of the pathogeneses of diseases.The biomedical literature contains a wealth of information about the relationship between diseases and variations.By applying natural language processing techniques and data mining methods,we can identify key biomedical concepts(Named Entity Recognition)such as genes,mutations,and diseases in the massive biomedical literature,and explore relationships between conceptual entities(Relation Extraction).The above methods need to process a large amount of literature and do a lot of calculation.Therefore,in this paper,with the help of high-performance computers,we realized the automatic and parallel analysis of massive literature.The main research contents in this paper include:1)The improvement for disease NER and mutation NERA disease-named entity recognition method based on context and hierarchical ontology is proposed which utilizes disease ontology information and cross-sentence analysis.The experimental results prove that our method can effectively solve the problem that the extraction concept is too broad because of the reference,and improve the accuracy of the disease named entity recognition;A context-based variant named entity recognition method is proposed,which uses context information to extract the detailed attributes of mutation(variation position,related genes,etc.).It solves the problem that errors occur during the normalization process when mutation and disease entities do not co-occur in a sentence.The experimental results prove that the method effectively improves the accuracy of the mutation named entity recognition.2)ParaBTM:A parallel text mining framework for disease and variation relations based on Tianhe-2.We implemented a parallel text mining framework aiming at relation exaction between disease and mutation on Tianhe-2.A large amount of literature was deployed on Tianhe-2,and three reasonable load balancing strategies were designed.The computational power of supercomputer can be fully utilized.Experimental results show that this method can effectively solve the problem of deployment of huge number of documents,low processing efficiency,and uneven loading,which greatly shortens the time of biomedical text mining.3)VCF.Digest—an application of the relation extraction method based on LSTMWe proposed a distance and dependence based long-short term memory network model for mutation and disease relation extraction.The method is used to extract entity relation,and then the results of biomedical text mining are incorporated into an intelligent diseases and mutations interpretation system called 'VCF.Digest'.The system detects the genetic variation,provides evidence and corresponding confidence for the analysis of the relation between each mutation and disease as far as possible,assisting the diagnosis of genetic diseases and guiding the precise drug use of tumors.Practical cases prove that the system can provide scientific researchers,medical staff and individual users with the relation between mutations and diseases as well as the corresponding literature evidence.
Keywords/Search Tags:Biological medicine, Named Entity Recognition, Relation Extraction, Parallel Computing
PDF Full Text Request
Related items