Font Size: a A A

Motif Discovery Based On The Gibbs Sampling And EM Algorithm

Posted on:2008-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:X L ChenFull Text:PDF
GTID:2120360218950429Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
With the completion of genomes sequencing of many species and the advance ofmicroarray technology, we begin to possess a wealth of valuable biological data. One ofthe most challenging problems of this century is to decipher the information encodedin these data. It is an important problem to discover the conserved subsequences inbiological sequences. These conserved subsequences are called motifs.Gibbs sampling is the most successful method of discovery of motif and appliedmost broadly. The length of the motif is looked as constant in the previous litera-ture. But in fact the length can not be known in advance. In this article the length istreated as missing data and determined by the algorithm. And the results of experimentshow that our algorithm is feasible.Bailey and Elkan introduced the EM algorithm through two-component mixturemodel into the field of discovery of motif in biological sequences at 1994. In their method,the sequences are cut down to short subsequences at first. Then the two-component mix-ture model is used to fit the new dataset using EM algorithm. Notice that many data inthe new dataset can not be generated by the two-component mixture model. Here weuse a more-component mixture model to fit the new dataset. Every data can be gen-erated by the new model. Because our model can describe the data more precisely, theparameters can converge to the real value more quickly and precisely.
Keywords/Search Tags:motif, Gibbs sampling, EM algorithm
PDF Full Text Request
Related items