The Coding Sequence Recognition Of Prokaryote Base On Hidden Markov Model

Posted on:2014-12-11

Degree:Master

Type:Thesis

Country:China

Candidate:J Ma

Full Text:PDF

GTID:2254330398462100

Subject:Epidemiology and Health Statistics

Abstract/Summary:

PDF Full Text Request

Objective:This study is based on Hidden Markov Model, to identify Prokaryotes coding sequence and analysis the influencing factor of the identification, aims to study the theory of Hidden Markov Model deeply and provide research basis for the use in finding pathogenic sites and biological information mining.Method:In the paper we have built three models base on the baum-welch algorithm.They are100iterations HMM-gene model,10iterations HMM-gene model and100iterations HMM-nogene model. The training set get from the national biological information technology center (NCBI) and download from the shared resources. Before the training we have rejected the sequence that the length longer than20000bp and shorter than80bp.The training set is randomly selected from the2/3coding region sequences and2/3non-coding region sequences. The method of determine the effect of iteration is compared the accuracy of identifying the nucleotides. The test set is50sequences that randomly selected from the remaining1/3sequences. The recognition method of the paper is comparing the difference between1and the value, the value is the ratio difference based on the model for coding region and model for non-coding region.The test set are360sequences that each randomly selected from the remaining1/3coding region sequences and no-coding region. Then use specificity, sensitivity and accuracy evaluates the identify results of e. coli coding based on the method of the paper. Seeing the length and CG%of the sequence as two factors, use logistic regression to analysis their effect.Results:We fund the identification accuracy of nucleotides of100iterations(65.15%) is much better than the10iterations(49.89%)based on recognized50sequences at the same time. For recognize the sequence, the specificity is67.78%, the sensitivity is73.33%and the accuracy is70.56%based on the method of the paper. We fund the sequence that the length longer than1000bp and the CG%higher than53%has better effects; lower CG%doesnâ€™t have a good result. Conclusion:HMM has a good use in gene recognition and fully iteration is very necessary. The sequence has longer sequence and higher CG%has better effect. In the paper still exists many problems, such as modification of training set, the judgment method of further improving, and the more fully consideration of the characteristics of biological sequences, etc., all of these need further study.

Keywords/Search Tags:

Hidden Markov Model, Coding Region Recognition, Prokaryote

PDF Full Text Request

Related items

1	Recognition Of Medical Image With A Convolutional Operator Based Hidden Markov Model
2	Research And Development Of Small-scale Speech Recognition System
3	Resratch Of Biomedical Named Entity Recognitioin Method Based On Hadoop And Hidden Markov Model
4	Application Of Hidden Markov Model In Segmentation Of Brain MR Images
5	Automatic Evaluation Of Speech Intelligibility For Uyghur Postoperative Patients With Cleft Palate Based On Hidden Markov Model
6	Hidden Markov Model Reveals The Abnormal Brain Dynamics In Patients With Alzheimer’s Disease
7	The Study On The Evolution Of Severe Hepatitis Based On The Hidden Markov Model(HMM)
8	Development And Application Of A Hidden Markov Model Based Method For Somatic CNVs Detection At Single Cell Level
9	Modeling Liver Cirrhosis Progression Using Hidden Markov Model
10	Automated J Wave Detection Based On Hidden Markov Model