Font Size: a A A

Application Of Multi-instance Multi-label Hierarchical Clustering Algorithm On Gene Function Annotation

Posted on:2015-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z H DengFull Text:PDF
GTID:2370330491952506Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Annotation of gene function is an important research direction of genome annotation.In the post genome era background,genome annotation is one of the main challenges.It studies for the people to deeply understand the logic framework of human genome genetic language,the relationship between the gene structure and function,individual development,growth,aging and death mechanism,neural activity and brain function mechanism,mechanism of cell proliferation,differentiation and apoptosis,and the mechanism of information transmission,gene and disease occurrence,development of the mechanism(such as pathogenesis,pathological process and provide the scientific basis of common)as well as a variety of life science problems.However,in the face of massive whole genome data,only with the help of the machine learning method is competent to.This paper will use the method of multi instance multi label learning,prediction of gene function,the main research work is as follows:Firstly,this paper analyzed the gene expression patterns,and from its expression pattern characteristics to predict gene function,detailed analysis of the derivation process between the relationship between gene functional annotation of gene function in the database and relations.In view of this kind of gene annotation database more gene pairs and function of many correspondences,this paper introduces the multi instance multi label learning framework and the gene ontology,and learning framework for multi instance multi label analysis.Secondly,aiming at the multi instance multi label learning framework pruning strategy will loss of genetic correlation between the problem,in this paper,the hierarchical clustering algorithm thinking into the multiple instance learning framework,presents many examples of hierarchical clustering algorithm.The algorithm based on the correlation characteristics of gene expression,the timing will have similar functions of gene expression data subset as the gene function example set multi sample construction.Pearson correlation coefficient using gene temporal expression data between the computation of multi sample distance,and the use of gene related to maximize the function of the gene in the class in the cluster constraints,make the correlation information between genes can be well preserved in the process of clustering.Finally,the expression profiles demonstrate the effectiveness of the algorithm in four genes in Saccharomyces cereviviae,the experiments show that the algorithm in multi instance multi label learning the correlation between information retained good genes in the degradation process framework,and has a good performance.
Keywords/Search Tags:Time-series gene expression, Gene function annotation, Machine learning, Multi-instance multi-label
PDF Full Text Request
Related items