Font Size: a A A

Research On Eukaryotic Promoter Recognition Algorithm

Posted on:2012-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:R H XinFull Text:PDF
GTID:2210330335475985Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the implement of Human Gene Project and completion of Gene Draft, biological science and technique has achieved fast development. At the same time, it helps accumulating a big volume of relative data, which grows with the speed beyond people's imagination. In front of such big volume of data, it becomes a very important and difficult task to search for the right information that we are looking for. Promoter plays an important role as the important element in building gene transcriptional regulatory networks. Besides eukaryotic promoter comparing with prokaryotic promoter has more complex structure. In turn, eukaryotic promoter recognition becomes a very hot and difficult field in current gene research.Many algorithms have been proposed in promoter recognition, but all the higher prevalence of false positive problem. In order to improve the existing promoter recognition algorithm deficiencies, and further improve the performance of promoter prediction algorithm. In this paper we applies Z curve theory and structural property on researching promoter prediction, bring forward to a recognition algorithms based on structural property and Z curve property of the eukaryotic promoter. These 6 different structural property parameters picked out from the model could describe gene sequence spatial morphology very well. At the same time, the Z Curve property could describe the distribution of hydrogen bond from a general perspective. As a result, we could distinguish promoter and non-promoter sequence from both partial flexibleness and stability.Firstly, extracts promoter and non-promoter structural features and Z Curve features from train set. Then, build structural feature classifier based on mahalanobis distance; build Z Curve feature classifier base on Fisher criterion. Each classifier include 3 sub-modules, e.g. promoterexon classify sub-module, promoterintron classify sub-module, promoter3'UTR classify sub-module. Each sub-module of the classifier based on the characteristics of their respective classifier. In the end the classifier transfers it's predict results to the comprehensive evaluation module to make the final prediction.In order to evaluate the performance of our algorithm, we selected 6 genomic sequences from GenBank to test, the accession number of these sequences are L44140, D87675, AF017257, AFl46793, AC002368, AC002397. The evaluation results are 71.92% in sensitivity and 55.56% in specificity, 63.47% in accuracy. The final experimental results show that the algorithm has better performance of promoter recognition.
Keywords/Search Tags:Eukaryotic promoter prediction, structural property, Z curve, mahalanobis distance, Fisher
PDF Full Text Request
Related items