Font Size: a A A

Rsearch On The Method Of Gene Identification Based On Sequence Statistical Features

Posted on:2018-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:R GuoFull Text:PDF
GTID:2310330533469240Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the face of the world's vast array of genome-wide genome data,it has a very great practical significance to efficiently and accurately identify their CDs.The significance makes gene identification become the basis of bioinformatics research and development,which has always been favored by scholars of all ages.Traditional research methods rely on cumbersome biological experiments,so the process is slow,time –consuming and labor-intensive.In this dissertation,based on the theory and method of signal processing,such as Fourier transform,filter algorithm,intelligent calculation and statistical learning,etc.,we study the problem using the statistical characteristics of the sequence.Yet the period-3 behavior has been widely used in gene recognition as an important statistical feature.In order to obtain better recognition performance,researchers have made great contribution to the signal processing of gene sequences and the enhancement of their period-3 features,but some shortcoming still exists.This dissertation presents a novel improved algorithm with better filtering effect and enhancing period-3 behavior for the problem of fixed-step LMS adaptive filter algorithm in gene prediction,by combining the feedback output of the system and the characteristic information of base components of gene sequences.It is called LMS adaptive filter algorithm with a changeable step size,and this dissertation verify its performance via simulation experiments.The results confirm the superiority of the proposed algorithm to the existing ones in accuracy.Additionally,for the short gene sequences whose feature information is weak and are not conducive to be distinguished,this dissertation also propose a new multi-feature weighted fusion algorithm based on its single-feature representation,mainly focusing on identification of the gene sequence whose length is less than 200 bp.Compared with the traditional multi-feature fusion algorithm,ours is proved to be more effective and robust.Integrating the two above-mentioned improvements,the dissertation have implemented a gene identification system of human genome.The system is free from the dependence of traditional machine learning methods,such as CRF?HMM and SVM,etc..So it is simple to implement and does not need to save a large number of model parameters.It cannot be excessively influenced by the knowledge structure of the existing training dataset and can perform real-time identification.By using the standard test data set ALLSEQ and HMR195,the dissertation comprehensively verifies its performance.
Keywords/Search Tags:gene identification, multi-feature fusion, Fourier analysis, adaptive filter
PDF Full Text Request
Related items