Font Size: a A A

Research On Double-block Planted Motif Finding Algorithm Based On Frequent Pattern Mining

Posted on:2010-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:L L YangFull Text:PDF
GTID:2178360275973138Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the great development of new technology in computer science and biomedicine industry domain, the secrets of human gene express and the translation of genetic information have been discovered over and over, more and more foresighted scientists concentrate their researches on solving Planted Motif Finding Problem(PMFP), which controls the gene express in DNA line.In this thesis, DNA line planted motif, Data Mining technology, and some other relevant theories would be fully discussed. We also make a sufficient summary of the latest research finds on PMFP. Using frequent pattern mining algorithm to solve Double-block Planted Motif Problem (DB-PMFP) which appeared more frequently among higher organisms named D-Apriori algorithm would be introduced in detail as well. As we all know, DNA basic sequence having "approximate frequent pattern" characteristics in the DNA single-chain, there are two classic algorithms, Apriori algorithm and FP-Growth algorithm, which use frequent pattern theory, would be referred and analyzed in our research. Based on the advanced characteristics such as sealing property of Apriori algorithm and compressed structure tree property of FP-Growth algorithm, we use Bpriori algorithm in finding planted motif. Furthermore, efficient splicing of Single-block Planted Motif using D-Space algorithm is approved to be efficient in finding Double-block Planted Motif. Under the large amount of validate testing by using synthetic data sets and real data sets respectively, the algorithm we proposed is approved by experiments that it could provide an successful solution in finding Double-Planted Motifs with their lengths being unknown and meeting all the requirements.The results of this topic not only have a performance advantage of solving problems in finding Double-block Planted Motif in DNA line, but also could be applied to all of the researches about finding a solution to the issue of string in Data Mining area. As a theoretical tool for further describing and predicting biological gene expression for human disease prevention, detection and treatment, these results provide more direct and effective solutions. As a result, this production could accelerate the development in the field of bio-informatics and medical.
Keywords/Search Tags:Double-block Planted Motif, Data Mining, Frequent Pattern, Bpriori algorithm, D-Apriori algorithm, D-Space algorithm
PDF Full Text Request
Related items