Font Size: a A A

Study On Algorithms For DNA Sequence Motif Discovery Based On Gibbs Sampling

Posted on:2019-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:C PeiFull Text:PDF
GTID:2428330545950695Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As we all know,the reveal of gene expression regulation mechanism and the establishment of gene regulatory network are arduous tasks in the field of bioinformatics.So far,these task are too difficult to complete.Identifying regulatory elements is essential for understanding and revealing gene regulatory mechanism and building gene regulatory network.The regulatory elements include cis-regulatory elements and trans-acting factors.The transcription factor is trans-acting factor and is the key to starting the gene transcription.Therefore,the study of transcription factors is an important starting point for revealing the mechanism of gene expression regulation.Studies have shown that it is difficult to research transcription factors directly.It is much simpler to explore transcription factors indirectly by transcription factor binding sites.The study of transcription factor binding sites is mainly reflected in how to solve the problem of DNA motif discovery.The refore,solving the problem of DNA motif discovery is the key to understanding and revealing gene expression regulation mechanism.In massive DNA promoter sequence data sets and ChIP-seq data sets,due to the large scale and complex structure of the data,it is a huge problem to effectively and accurately find biologically meaningful motifs and locate these motifs in genes.The combination of biological experiment technology and computer information technology brings hope to solve the problem of motif discovery.The key point is how to design a motif discovery algorithm with high computational efficiency and high prediction accuracy.Therefore,the research on motif discovery algorithm is an enduring topic in the field of bioinformatics.In order to maintain higher time performance and improve the accuracy of the algorithm's motif prediction,a new algorithm is proposed in this paper.The main research content of this paper: A novel motif discovery algorithm for big data sets is proposed.It is a combination of Gibbs sampling and maximal cluster clustering algorithm(Combining Parallel Gibbs Sampling with Maximal Cliques for hunting DNA Motif,GSMC).Not limited to the OOPS model and the limited fixed-length of the motif,we use the ZOOPS model and variable length mechanism,based on the mutually exclusive and parallel Gibbs sampling algorithm with global parallel searching capabilities,mutual exclusion,rapidity,randomness,variable length and scalability,combining the maximal clique clustering algorithm with strong logic,simple steps,flexibility,scalability and high parallel computing efficiency,the two algorithms are merged and the fusion algorithm is implemented.The information content function of motif,similarity score function of motif,and clustering function of motif are further optimized to avoid falling into local extremes and find the global optimal solution.Firstly,we use the mutually exclusive parallel Gibbs Sampling algorithm to quickly and randomly generate initial motifs;then,we calculate the similarity between all the motif pairs that satisfy the IC based on the SPIC similarity measure method between columns;finally,we use the maximal clustering algorithm and parallel computing so that the final candidate motif can be clustered and filtered efficiently.It can be seen from the comparison experiment that,unlike most of the current motif discovery algorithms,the proposed algorithm not only improves the recognition accuracy of the motif,but also maintains better time performance.In addition,it can also detect more cofactors.
Keywords/Search Tags:Regulation, Big data sets, DNA motif, Sampling motif, Clustering motif, Cofactors
PDF Full Text Request
Related items