Font Size: a A A

Mining Similar Semantic Subspace For Gene Explanation Based On Gene Information Network

Posted on:2022-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y D ZhangFull Text:PDF
GTID:2518306551470424Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Analysis of gene similarity not only can provide information on the understanding of the biological roles and functions of a gene but may also reveal the relationships among various genes.Generally,current research on the gene similarity analysis can be divided into three main categories: the sequence-based,the annotation-based,and the association-based approaches.However,all these methods usually measure similarity in a quantitative way and have trouble with only using single similarity metric.Besides,many of them compare the quantitative results of similarity measure directly for similarity analysis.Moreover,they are ignoring the difference between the similarity analysis under different semantic subspaces which leading to biased results.To address the above problems,we have proposed a novel method,named similar sematic subspace mining based on the gene information network,for gene similarity explanation.The main contributions are as follows:(1)Fusing multi-source biomedical data to construct the gene information network(GIN),which contains 7 biomedical entities(i.e.,Gene,Gene Ontology Term,Protein,Drug,mi RNA,Disease,Phenotype),and 7 types of relationships(i.e.,“Gene-Protein”,“Drug-Protein”,“Gene Ontology Term-Gene”,“Gene-Disease”,“mi RNA-Gene”,“mi RNA-Disease”,“DiseasePhenotype”).This gene information network describes the characteristics of genes from the following aspects in terms of gene function,regulation of gene expression,gene products,gene-targeted drug,pathogenicity,and genetic phenotypes.(2)In view of the different number of genes for similarity analysis and based on the path semantics on the constructed GIN,the concept of gene similar semantic subspace is proposed based on the meta path semantic.And the pairwise-genes similar semantic subspace mining algorithm named SCENARIO and further the multiple-genes semantic subspace mining algorithm named SCENARIO-M are designed and implemented,respectively.(3)In the pairwise-genes similar semantic subspace mining algorithm SCENARIO,to solve the problem of single metric used merely,we have introduced the similarity metric based on the meta path semantic of GIN;to address the problem of meta path searching,we have introduced the breadth-first strategy to traverse the GIN to construct a gene meta path searching tree and return the maximum meta path length of the target gene pair which provides the restriction condition of meta path searching;to analyze the similarity of target gene pair under complex semantics,a similarity metric based on gene semantic subspace is proposed.Experiments on real-world datasets demonstrated that SCENARIO is effective,efficiency and scalability.(4)In order to perform the gene similar semantic subspace mining in more general circumstances,algorithm SCENARIO-M is then designed to mine the similar semantic subspace among multiple genes,which means a gene set.The gene meta path searching tree is used to return the set of gene semantic subspaces of any pairwise genes in the target gene set,and the gene semantic subspaces of the target gene set is merged through the intersection operation.Finally,calculating the similarity of the target gene set in each semantic subspace,and return the similar semantic subspace with the highest rank.Experiments on real-world datasets and combined with the performance of the pathway enrichment analysis demonstrated that SCENARIO-M is effective in performing the task of gene similar semantic subspace mining among multiple genes.
Keywords/Search Tags:gene similar sematic subspace, gene information network, gene meta path
PDF Full Text Request
Related items