Font Size: a A A

Addlication Of Multi-instance Multi-label On Gene Function Annotation

Posted on:2015-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2428330488999629Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the post genome era,the main challenge is genome annotation.As an important part of the genome annotation,gene functional annotation plays a decisive role.To understand the gene for human better,it provides the necessary basis for exploring the origin of life.The sequencing of the human genome project to produce awhole genome data,these data provide abundant biological information for the study of gene function annotation.However,to obtain useful knowledge from a large amount of data,machine learning is better than the artificial method.In this study,the multi-instance multi-label learning method is used to settle this task.The main research work is as follow:Firstly,from the perspective of gene expression patterns of gene function prediction,we detailed analyse the relationship and derivation between gene functionsin gene function annotation database.As the relationship between genes and its functions in annotation database is N to N mapping,we introduce and research the multi-instance multi-label learning framework to settle the mapping problem in this study.Secondly,the multi-instance multi-label learning framework was usually settled by degradation strategy.However,this strategy will recede the correlation betweengenes.This study combined multi-instance learning framework with hierarchical clustering algorithm to propose multi-instance hierarchical clustering algorithm.This algorithm based on correlation of gene expression,and the subset of time-series expression data of genes,which these genes have the same gene function,are treated as a-bag-instances of the gene function to build the multi-instance of gene function.The Pearson correlation coefficient between the geen time-series expression data are calculated as distance between these multi-instances,and constrained the size of gene cluster by maximizing the correlation of the genes with the same gen function.This algorithm can well maintained the correlation of genes in the clustering process.Finally,in order to verify the effectiveness of the algorithm,four gene time-seriese expression datasets of Saccharomyces cerevisiae are used to experiment in this study.The multi-instance hierarchical clustering algorithm degenerate multi-instance multi-label task into a single-instance multi-label task,and then the multi-label SVM or multi-label K-nearest neighbor algorithm are used to solve it.Experimental results show that the algorithm can maintain a good correlation between genes in the degradation process of multi-instance multi-label learning frame,and has a good performance.
Keywords/Search Tags:Time-series gene expression, Gene function annotation, Machine learning, Multi-instance multi-label
PDF Full Text Request
Related items