Font Size: a A A

Comparison Study And Algorithm Analysis For Recognition Of Regulatory Elements

Posted on:2008-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:T WangFull Text:PDF
GTID:2178360212997306Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Biocomputing is created by the development of Bioscience and the beginning of the Project of the Human Genome. In the 20th century 50', there was the creation of the Biocomputing. In the 20th century 70', the foundation of the Biocomputing was built. But the progress of Biocomputing is in the 20th century 90'. At that time, under the energy of the Biocomputing, there was an advance in Biocomputing. The headspring of the Biocomputing is the Project of the Human Genome. Furthermore, the strength of Biocomputing is the problem which will be solved in the project.Tools of Biocomputing are computers and network. Using theories, the methods and the technologies of the mathematics and information science, the Biocomputing studies the biological macro-molecule. The key research is nucleic acid and protein, including sequences, structures and functions.Biological system is mainly composed of static and dynamic components. The static component includes all genes in the genome, which are the basic construction elements of biological system. With the completion of the genome sequencing and annotation, special interests have been shifted to the construction of gene regulatory network, the dynamic component of the biological system. Recognition of regulatory elements is the most important part of gene regulatory network.There are three basic problems in the Recognition of Regulatory Elements. First is what kind of language describes the regulatory elements. Second is how to scale the scores of the sequence. Third is how to analytic the highest scores of the motif in the sequenceIn the paper, we study and research how to effective recognitory regulatory elements in the Gene Regulatory Region. The change of BioProspector is more efficacious than the former. At the same time, three softwares of the recognition of motifs are introduced and compared with each other.There are generally two strategies for DNA sequence motif finding that are explored in recent years– using enumeration to check the over-representation of all possible w-mers and using iterative processes to update a motif probability matrix. The second strategy calculates the expected frequency of each possible motif ofwidth w based on background or input sequence distribution, then searches for w-mers that are much more abundant in the input than expected. This method is guaranteed to find motifs with the greatest scores, but it does not allow flexible substitutions in the matching segments. Also, the motifs that could be enumerated are limited in size (£7 bases long). The third strategy employs a probability matrix for the motif, specifying the probability of each base at each motif position. An iterative procedure, implementing either expectation maximization (EM) or a Gibbs sampling algorithm, is then applied to improve the matrix until convergence.BioProspector, a program using a Gibbs sampling strategy, examines the upstream region of genes in the same gene expression pattern group and looks for regulatory sequence motifs. BioProspector uses zero to third-order Markov background models whose parameters are either given by the user or estimated from a specified sequence file. Moreover, BioProspector modifies the motif model used in the earlier Gibbs samplers to allow for the modeling of gapped motifs and motifs with palindromic patterns. All these modifications greatly improve the performance of the program. Although testing and development are still in progress, the program has shown preliminary success in finding the binding motifs for Saccharomyces cerevisiae RAP1 and Escherichia coli CRP.The majority softwares of the Recognition of Regulatory Elements is based on the technique of EM, Gibbs and the improvement of them.In the paper, according to two facts, we distinguish three softwares: the recognition capability and construct the common regulatory elements. Moreover, we get some the conclusions.1. MEME is a tool for discovering motifs in a group of related DNA or protein sequences. A motif is a sequence pattern that occurs repeatedly in a group of related protein or DNA sequences. MEME represents motifs as position-dependent letter-probability matrices which describe the probability of each possible letter at each position in the pattern.2. AlignACE find regulatory elements, based on Gibbs. Finding DNA Regulatory Motifs within Unaligned Non-Coding Sequences Clustered by Whole-Genome mRNA Quantization. As a result, there are the different results of the same input.3. For a group of sequences that might contain a transcription factor binding motif, if biologists are more confident about a subgroup of the sequences, then MDscan will be really useful. The basic strategy of MDscan is to search for motifs from high confident sequences first because in these sequences signal noise ratio is higher.Although it does not find two interactional motifs, the softwares could recognize only motif clearly. After tested by CRP, we find MEME is the best efficient of these sofewares.
Keywords/Search Tags:Recognition
PDF Full Text Request
Related items