| With the development of genome sequence projects, the number of the sequence and bases in Genbank,EMBL and DDBJ increase quickly. It is very important to analyze the sequences at gene and genome levels. In this thesis ,we focus our research on finding the start codon in the genes in prokaryotic genomes and protein-coding sequences of human genes.In chapter I, the background of bioinformatics is introduced, and the main fields are represented.In chapter II, the knowledge of biology related to this study is explained, including the feature of the prokaryotic genomes and the progress of computational gene-finding algorithms of the prokaryotic genes.In chapter III, three mathematic methods are described: Fisher discriminant algorithm, Markov chain model and Jack-Knife method. We also explained how to use these three methods in recognizing protein-coding genes and the start codon.In chapter IV, the work has been finished during my study. Six features with biological meanings are incorporated, including mononucleotide distribution patterns near the start codons, the coding potential, the codon entropy near the start codons, the distances from start codons to stop codons of the up-stream sequences in the same frames, the start codon types, the distance from the most-left start codon to the start codon. The proposed method correctly predicts 92.82% in the translation start sites of 195 experimentally confirmed E.coli genes. The accuracy of predicting the start codons in B.subtilis genes is 96.55%, 94.44% and 96.08% respectively for three B.subtilis databases.In chapter V, other work is introduced during my study. |