Font Size: a A A

Research On Predicting DNA Sites Information Based On Sequence Information

Posted on:2016-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiuFull Text:PDF
GTID:2310330470971450Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the implement of the Human Genome Project (HGP), The sequence of the human genome was completed decode that with the rapid increasing number of sequenced genomes, but about the knowledge of the attribute?modification property and function of DNA sequence is very limit. Therefore, it is very crucial to identify the property of the before mentioned based on the method of the systems?pattern recognition control theory?machine learning and information processing. The purpose of this study is based on the DNA sequence to find out or analysis the genetic information or the codons which hidden in the sequence, such as the prediction of the DNA splice site, DNA methylation sites, and nucleosome position, to understand the essence of the life, to reveal the variation process of the human body functions, to provide a new pathway in diagnosis and treatment of disease and provides an important theoretic foundation for researching drug design.The medical is very concerned about research the process of the DNA methylation, it is not only related to aging, cancer, and also have a certain relationship with single genetic disease. In this paper, The approach of using the3?1 codon conversion to incorporate the long-range or global sequence order information of DNA, and based on the physical chemistry properties of the amino acid to construct the discrete DNA model, as well as the technique of using NCR and SMOTE to optimize unbalanced datasets and using the SVM algorism to established a prediction, and the Target-Jackknife cross-validation test method prove that the prediction success rate of our DNA methylation site predictor than the existing predictor has a larger increase.The nucleosome is crucial roles in regulating cellular processes includes repairs, recombination, gene expression, disease development, chromosome composition, and mRNA splicing. In this study, a new discrete mode was used for formulation the DNA sequence, which can represents its sequence-order information by combining its trinucleotide composition and the pseudo amino acid components of protein translated from the DNA sample according to its genetic codes, and analyzing the Markovian entropy and the related information redundancy. And using SVM algorism to predict the nucleosome position, the result is very promising.The process of the removing intron and joining the exon together to make the coding region called splicing. The present study proposed a new model which was carried by the Fuzzy K-nearest neighbor (F-K-NN) algorithm based on forward feature extraction optimization by incorporating DNA local structural property parameters and nucleic acid composition, according to the result of the cross-validation prove that the prediction success rate is higher than before.
Keywords/Search Tags:bioinformatics, nucleosome, feature extraction, DNA splice sites DNA methylatio
PDF Full Text Request
Related items