Font Size: a A A

The Research And Application Of Mining Biological Sequential Pattern

Posted on:2009-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:R ZhangFull Text:PDF
GTID:2178360272959396Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
From the time of starting human gene project, more and more gene sequences of mass species have been found out. So much biological information has been accumulated. At the same time, many protein structures which are the peptide chain of amino acid sequences have also been detected. The biological information has been stored in the different databases .The datum form a biological information sea in the world. But it doesn't mean that we could get the knowledge while we have much datum. Although large biological information will provide us with rich datum, how to get the valuable result from the original information, discover the rule in the biological sequences and mine the meaning of life are a difficult problem and a focus. The biological sequence is one of the most important biological data. We will find genetic rules and relations between species by analyzing the biological sequences. The related work has important meanings and value in discovering the essence of life, searching methods of improving beings and researching new medicine.By analyzing characters of biological sequential datum, and problems appearing in the methods of mining biological sequential patterns nowadays, this thesis acquires some achievements:1) Considering special attributes of biological sequential patterns, this thesis devises a new algorithm named MS-BioSM to mine biological sequential patterns. MS-BioSM adopts a new pruning algorithm which improves prefixspan algorithm. And MS-BioSM uses "multiple support degrees" to mine biology sequential patterns. So sequential patterns got by MS-BioSM contain more biological meanings. The experimental result shows MS-BioSM not only increases the efficiency of search but also decreases the space complexity.2) Transcription factor binding site (TFBS) is an important kind of gene sequential patterns, and it has its own special characters. In order to predict TFBS exactly, this thesis has designed an algorithm named GBMM (Predicting Binding Sites Using Multiple Markov Models optimized by Genetic Algorithm). In the experiment, several Markov models have been created by analyzing transferring probability between characters and character chips of genes. Then genetic algorithm is used to optimize parameters to combine the several models. The experimental result finished on real data set shows the excellent effect of GBMM.3) Considering data mining technology, this thesis brings several principal especial operations on biological sequences and devises a new biological sequence database management system. This system will provide an efficient platform for researching bioinformatics.
Keywords/Search Tags:bioinformatics, biological sequential pattern, binding site, database, data mining
PDF Full Text Request
Related items