Font Size: a A A

Promoter Recognition System Research From Gene Sequence Data

Posted on:2010-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y K DongFull Text:PDF
GTID:2178360272979387Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the successful implement and complement of Human Genome Project, lots of genome information is acquired. It is an important and necessary work to analyze the information from HGP research. Eukaryotic promoter recognition is one of hotspots and difficulties in genome research. Promoter is an important sequence regulation element. The promoter recognition has been a crucial part of the gene structure recognition. It's also the core issue of constructing gene transcriptional regulation network.There are many methods of promoter recognition; however the predictions false positive is widely very high. In order to improve the situation of high false positive in eukaryotic promoter recognition, biology knowledge and bioinformatics database information are collected and study. Then a eukaryotic promoter predication method based on principal component analysis, PCA, is presented. This method extracts gene sequence content features, and compresses high-dimension statistical features to low-dimension statistical features, by PCA, to generate principal component features. CpG island features and principal component features are regarded as recognition features.The method employs human promoter codon and pentamer information as features to generate content features matrix; thus the dimension of features is very high. In order to reduce the dimension of features, this method compresses high-dimension statistical features by PCA. Principal component analysis is an effective multi-variable analysis method. Its main principle is projecting feature matrix to new feature space to get new variables. After space transform, the principal components which could represent the most former variables are reserved to form new space; then reducing dimensions implements. In order to supplement the lost information in feature compression, this method extracts 10-dimension principal components by PCA. The 10-dimension features and 2-dimension CpG island features, based on integrated features, are put into the BP neural network classifier.Our system tests the promoters from human gene sequence, L44140, D87675,AF017257,AF146793,AC002368 and AC002397, using trained neural network classifier. The final sensitivity and specificity are respectively 64.70% and 44.00%.To evaluate the recognition ability of our system; our system is compared with PromoterInspector and DPF. The experiment results show that the system not only reduced the false positives but also obtained a higher sensitivity and specificity.
Keywords/Search Tags:promoter recognition, feature extraction, principle component analysis, CpG islands, BP neural network
PDF Full Text Request
Related items