Font Size: a A A

Algorithm Based On Gibbs Algorithm And Its Application In Identifying MOTIF

Posted on:2011-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2120360305955248Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
DNA is the carrier of genetic information. Information from the gene nucleotide sequences have been extracted, used to guide the process of protein synthesis, for all life on earth are the same, called the central dogma of molecular biology. Bio-mentioned genetic information in the form of a password exists in the DNA molecule, the performance of a specific sequence of nucleotides, and through DNA replication makes the genetic information passed from one generation to the next. The growth and development in the offspring in the process, DNA molecules of the genetic information transcribed into RNA molecules, generated by the RNA translation in vivo of various proteins, to exercise a variety of specific biological functions. Translation process is carried out in the ribosome. So, through the genetic information from the previous generation (parental generation) passed to the next generation (offspring), and expression of filial conduct, allowing offspring access to the genetic traits of the parental generation. At the same time, RNA has also through the replication process, to synthesize the same with its own molecules. In addition, the biosphere, there are still under the guidance of DNA by the RNA synthesis process, this process is called reverse transcription (generally found in the transcription of the virus). Through gene transcription and translation are protein molecules that can in turn act on DNA, to regulate the expression of other genes.Although an organism, any one cell carries the same genetic information, with the same genes, but also a gene in different contexts, such as different organizations in the performance of different cells are not the same, this situation gene regulation mechanism is determined.One of the keys of gene expression during transcription initiation. Transcription is not just a start, but it needed a material called RNA polymerase to identify genes in a specific area, like the same key and combines these genes in a specific area, and then only for normal gene transcription carried out. Biology will be the "key" is called transcription factor. For the double-stranded DNA, transcription factors combined with most of them will, and only with binding; between the transcription factor may also be combined with each other and together.On the whole, the majority of transcription factors will be integrated into the gene upstream promoter region, the so far most of the research for the transcription factor essential to stay in the promoter region of the upper level, and, the study found upstream 1KB so is that these transcription factor binding sites frequently.Practice has proved that direct study of the transcription factor would be more difficult, but the use of transcription factor binding sites for transcription factors of the study, and through a computer to simulate these binding sites is relatively easy. Therefore, the present study the main method of transcription factors is to use a computer to handle such information.Transcriptional regulation mechanism of gene expression is an important part of the genetic information that it has become a hub for the transfer and expression. Because gene transcription by many factors, so the identification of regulatory elements has become the study of gene transcription regulation and expression where Shutdown. In this area, in the past on the MOTIF forecast is associated with some biological experiments to achieve.But with the increasing development of science and technology, increasing the amount of data, the traditional biological experiments can no longer meet the needs of high-growth data. High-speed computer processing and storage capacity increased by the scientists of all ages. Therefore, scientists in this field of computer simulation to gradually replace the test in the process of trying to work. This replacement is based on the assumption that: the same has the same regulatory role of the MOTIF. The conservative assumptions made using computers to predict the MOTIF has become possible. AlignACE, MDScan, MEME and other software, is one representative.As the field of biology of the transcriptional regulatory elements of the constant attention, national bio-scientists have also made a rich variety of identification MOTIF algorithm. So far, recognition MOTIF algorithm has reached the hundreds in the past. Among them, more typically have the greatest expectations of algorithms, greedy algorithm, Gibbs sampling algorithm, MEME algorithm, theoretical models based on Bayesian Gibbs sampling algorithm, Consensus algorithms, AlignACE algorithm, Helden algorithm, based on Markov chain algorithm, BioOptimize algorithm, based on the amount of information of the prediction algorithm, k projection method, the hash projection method.MOTIF on the search for biological research, has made considerable progress in foreign countries. However, in this area is still in its infancy, there are not many reports on the field. However, unlike the classic software, forecasting software, etc. AlignACE than there is available space for the development of China, which is the significance of this study. In this paper, biology, gene transcription regulatory elements (MOTIF) made a brief introduction and summarizes the current popular MOTIF modeling methods, such as string model, matrix model, visualization, model, and analyzes the advantages and disadvantages of each model.This article describes the MOTIF measure of recognition results, such as Z scores, chi-square statistics, information content, consistency scores, log-likelihood. And introduces several common MOTIF identification methods, such Expectation-maximization algorithm (EM), Hidden Markov Model algorithm (HMM), counting, significantly temper sequential algorithm (WORDUP), Gibbs sampling algorithm (GIBBS).In this paper, the Gibbs sampling algorithm in-depth analysis and description, combined with Gibbs sampling algorithm of the simulation experiments, the advantages and disadvantages of the Gibbs sampling algorithm is summarized, and an improved Gibbs sampling algorithm is done are discussed.In this paper, an improved Gibbs sampling algorithm, which is based on the vertical than the right of the MOTIF second sampling algorithm, this method is characterized by the Gibbs sampling algorithm of traditional secondary sample. Gibbs sampling algorithm with the traditional difference is that this method is not the same as traditional Gibbs sampling algorithm in order to select the candidate sequence is updated MOTIF column and sampling, but rather select the frequency matrix, the product of the lowest scoring candidate to which the sample MOTIF, and then use them WORDUP algorithm is a significant analysis of regulatory elements from the candidate to find out the most significant MOTIF. Experiments can be found: After a second sampling candidate MOTIF, the distinctive character and adaptability than the traditional Gibbs sampling algorithm identified MOTIF higher.
Keywords/Search Tags:Bioinformatics, Gibbs algorithm, regulatory elements, adaptability
PDF Full Text Request
Related items