Font Size: a A A

The Research Of Motif Discovery Algorithm Based In The Biological Co-regulatory Network

Posted on:2022-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:T ChenFull Text:PDF
GTID:2518306731487674Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The emergence and development of high-throughput sequencing technology has played an important role in promoting genomics research.In a short period of time,genomics data and proteomics data show explosive growth,which provides a new direction for the research in the field of biology.As one of the key topics in bioinformatics,network motif discovery plays an important role in the study of key regulatory mechanisms and disease pathogenesis in target networks.Network motif refers to the subgraphs over presented in a large target network,which are considered to represent the key structure and regulation mechanism in the network.The complexity of motif mining algorithm is generally high,which is mainly manifested in two steps:subgraph search and subgraph isomorphism judgment.Although the sampling method can reduce the search space of subgraphs and improve the efficiency of motif mining,unreasonable sampling methods are likely to cause large errors in motif discovery.This paper proposes a variety of sampling methods to explore a more reasonable network motif sampling model.In addition,this paper also designs a new subgraph isomorphism judgment method to achieve faster duplicate judgment.Finally,this paper uses parallel technology to speed up the time-consuming steps of the algorithm.The main research work is as follows:(1)In order to improve the efficiency of motif discovery,this paper introduces equal probability unbiased sampling in each layer of the subgraph search tree.Only some candidate nodes are selected in each expansion of the subgraph,which reduces the search space of the subgraph and avoids the bias problem caused by edge sampling.With the help of the attributes such as the type and degree of the node(the sum of the out and in degrees of the node),the isomorphism judgment process of the subgraph is designed as a series of serial conditional judgment related to the attribute of the node,which speeds up the isomorphism judgment process of the subgraph.In addition,this paper also introduces multithreading technology to the time-consuming subgraph search step.Experiments show that with the increase of available CPU cores,multithreading can save more time.In order to find out which sampling rate is more cost-effective,this paper conducts multi group sampling experiments on different sampling rates,and the results show that when the sampling rate is set to 0.5,the sampling algorithm in this paper can achieve the best balance in time and subgraph restoration degree.(2)Based on the scale-free attributes of complex biological networks,this paper designs two sampling algorithms: the first one is based on the degree sequence of nodes,which needs to set a sampling ratio rate in advance.When the subgraph is expanded,the neighbor nodes of the current expanded nodes are arranged in ascending order of degree,Then the nodes are selected according to the rate ratio to form a new set of nodes to be expanded.The second method is based on hierarchical sampling.This method will count the degree of nodes in the data set,and divide the nodes with different degrees into their own layers according to the minimum variance hierarchical delimitation method.When the subgraph is expanded,the number of samples of each layer is calculated according to the preset sampling rate,and the algorithm extracts nodes from each layer to form the set to be expanded.The experimental results show that the two sampling algorithms are effective and can achieve the balance between the algorithm time and the target network restoration degree.
Keywords/Search Tags:Motif, Co-regulatory Network, Parallel Technology, Stratified Sampling, Random Sampling
PDF Full Text Request
Related items