Font Size: a A A

Reorganization And Discovery Of Functional Elements In Biosequences

Posted on:2006-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:W AoFull Text:PDF
GTID:2178360185963687Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Functional elements were the DNA segments that controlled the process of gene expression and gene regulation, so it was quite significant for people to study these functional elements in biosequences. This thesis did some research on the recognition of a specific functional element at first, and then expanded this problem into finding motifs in biosequences; this thesis also made some valuable studies on pattern finding in biosequences.There were two methods to recognize or discover these functional elements in biosequences mainly. One method was supervised recognition, which was to take advantage of some known information to determine a given sequence whether contained some specific functional elements; the other way was unsupervised learning, which was to utilize some measures of comparability and some search algorithm to discovery some potential signals in biosequences.Escherichia coli promoter, which could initiate transcription of a gene, was mainly consisted of two conserved sequences, -10 box and -35 box, and spacer between them, whose length is changeable. The nucleotide acids in both conserved sequences are mutative, even the length of the spacer is variable, and they both brought troubles in determining Escherichia coli promoter with computer. In this thesis, an algorithm based on multiple features for recognition of Escherichia coli promoter was proposed. Firstly, word frequency method was utilized to extract the content's information of a given sequence, and position weight matrix and Hidden Markov Model were applied to analyze the information on structure, and then this information was input into a classifier. Through testing on sequences from coding and non-coding part of Escherichia coli, this algorithm excelled other algorithm in average error rate.Pattern finding in biosequences was a quite challengeable task in bioinformatics. This thesis concerned about the comparability among these segments which a pattern can be constructed from, and then proposed the Signal Compatible Condition. And an exhaust-search sample-driven algorithm, Signal Compatible Algorithm which can find monad patterns, was proposed, and the search algorithm was depth-limit algorithm. Further more, SCA was expanded to find dyad signals with a little improvement in the way of constructing the search graph in pattern finding. With the tests on artificial data and biological data, SCA guaranteed to find the potential patterns in given sequence if they existed, and with SCC as pruning rule, SCA could accomplish the search in less time and with less memory.
Keywords/Search Tags:Bioinformatics, Functional element, Escherichia coli promoter, Pattern finding, Signal compatible algorithm
PDF Full Text Request
Related items