Font Size: a A A

Data Mining Of Isoforms In Biomedical Literature Database

Posted on:2017-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:M WuFull Text:PDF
GTID:2348330566956679Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Alternative splicing is an important process which can generate proteomic diversity and control gene transcription results.This process can produce isoforms with same functions or isoforms with different but very similar functions.These isoforms will have a major impact on biological growth,differentiation and disease.And with the promotion of bioinformatics research,the number of published biomedical literature grows exponentially.This research aims to extract the functions of isoforms from biomedical literature,and to perform theirs function annotation.In order to complete this task,this research,which is based on the DeepDive mining system,is going to do text mining work on biomedical literature,and do inference and learning for the results of the experiment over factor graphs with Gibbs sampling.Specifically,the author firstly did works about pretreatment and entity recognition on biological literatures acquired from PubMed,including natural language processing,and identification of isoforms and gene ontology terms based on the naming rules.Secondly she extracted candidates of entity pairs,and extracted the feature defined in the general feature database based on the results of natural language processing.Then the author produced training data for gene ontology and isoforms based on distant supervision learning,and set up corresponding heuristic learning rules to get a large number of labeled data from a small amount of labeled data.And based on factor graph for probabilistic reasoning,the research learned the value of the weight of each candidate's feature.Then the author inferred the probability of the functional relationship between the isoform and the GO in candidate pairs.Finally the author established functional database for isoforms and analyzed the experimental results obtained.
Keywords/Search Tags:isoforms, text mining, entity recognition, factor graphs
PDF Full Text Request
Related items