Font Size: a A A

On Eukaryotic Promoter Prediction

Posted on:2005-02-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q XiongFull Text:PDF
GTID:1100360152965617Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
The content of this dissertation is divided into two parts: eukaryotic promoter prediction and quantitative structure-activity relationship (QSAR) for biological macromolecules.Transcription is the first major step in gene expression, and constitutes an important point of genetic information flow. Promoter determines the positon of transcriptional start point and the frequency with which the gene is transcribed, so promoter recognition is important for genome annotation. La the first part of this dissertation, positional distributions of overlapping trimers are focused. Artificial neural network (ANN), genetic algorithms (GA) and Markov model (MM) have been applied to DNA sequence motif finding problems by developing neural network model and Markov model for recogniting these DNA sequence motif. The main contents and some conclusions are as follows:1. A new content-based method is developed for the detection of promoters of eukaryotic protein encoding genes. Three position weight matrices (PWMs) have been established for promoter, exon, and intron region. Sequence information content is quantificated by the content's representational score, which is calculated by comparing the sequence's content to the weight matrix. A backpropagation (BP) neural network is used to classify three DNA sequence motifs by establishing the relationship between sequence parameters and motifs. Experimental results indicate that the system performs significantly better on both the training set and the test set, the mean prediction rate is as high as 99% on the training set and 97% on the test set.2. A new model is developed to recognize eukaryotic promoter sequences from non-promoter sequences based on genetic algorithms and neural network. Experimental results demonstrate the system presented is effective to recognize the promoter sequences on both the training set and on the test set. The mean prediction rate is as high as 99% on the training set and 98% on the test set.3. A new method is developed to classify anonymous human DNA sequences into promoters and non-promoters using Markov transition matrix (MTM). The system combines three Markov chains modeling three DNA functional regions (promoter, exon and intron) respectively. the MTM for Markov chain is established based on the statistical concept of transition probabilities in specific functional DNA regions, andthese probabilities are modeled as a set of MTMs. Sequence was classified by the probability of this sequence generated by a stochastic process according to the transition matrix, a given anonymous sequence will be assigned to region which probability is the largest. A data set consisting of 400 human promoter sequences, 400 human exon sequences and 400 human intron sequences was applied for constructing the system and testing its performance. Experimental results indicate that the system has high classification for these DNA sequences. The mean accuracy of classification is 84%.Quantitative structure-activity relationship (QSAR) investigates the quantitative relationship between the molecular structural parameters and biological activities or functions. Quantitative sequenece-activity model for DNA and protein or peptide is an advanced topic of post-genomics (functional genomics or proteomics), and it is vitally important to the interaction of DNA and protein, the prediction of protein function, the process of drug-design and so on. Molecular structural characterization is one of the key steps to the successful QSAR. Molecular electronegativity-distance vector (MEDV), based on relative electronegativity of non-hydrogen atoms and relative distance between non-hydrogen atoms, is developed in our lab to describle molecular structure for pharmacy and various moleculars with different biological activities. In the second part of this dissertation, MEDV has been extended to characterize DNA and peptide, and A novel set of structural descriptors, called the bond-association classified molecular electronegativity-distance vector (BMEDV) has been developed according to a new classific...
Keywords/Search Tags:Promoter Prediction, Position Weight Matrix, Neural Network, Genetic Algorithms, Markov Model, Quantitative Structure-Activity Relationship, Molecular Structural Characterization, Molecular Electronegativity- Distance Vector
PDF Full Text Request
Related items