Font Size: a A A

Recognition Of Protein Coding Sequences Based On Graphical Representation

Posted on:2012-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:J YanFull Text:PDF
GTID:2230330395485619Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Genome sequences rich in biological knowledge and biological principles. Withthe development of Human Gene Groups (HGP) and the fast increasing pace of thegenome-sequencing projects, biologists have got genome sequences of hundreds ofspecies. Recognition of protein coding genes is the first problem in genome analysisafter the genome-sequencing. This paper describes some new approaches forrecognition of protein coding sequences, especially short coding sequences, andanalyzes it from graphic features and classification algorithm.According to base bias in the three positions of codon and base chemicalproperties, new graphical representations of gene sequences, are introduced forrecognizing short coding sequences of human genes. Nine effective features of areamatrix are extracted in the new curves and Support Vector Machines (SVM) is used toidentify the short protein coding sequences in human genes. In the process ofidentifying, the incremental feature selection algorithm is used to add four statisticalfeatures to express more information and improve the accuracy. Then PrincipalComponent Analysis (PCA) is worked for reducing dimensions. Finally, theexperimental results show that the method uses fewer features (seven or four) and getsbetter recognition results than other methods.Traditional Support Vector Machine (SVM) is sensitive to isolated point andnoise data, and has huge calculation. To improve this weakness, Least Squares FuzzySupport Vector Machines (LS_FSVM) is applied for classifying the coding/uncodingsequence instead of SVM. A new calculation method of the sample membership forLS_FSVM is proposed, in which the relation of samples has been taken into account.Compared with SVM and Least Squares Support Vector Machines (LS_SVM), thismethod obtains better recognition accuracy.
Keywords/Search Tags:Graphical representation, Coding/non-coding region, Identification ofshort protein coding region, Gene identification, DNA, Least squaresfuzzy support vector machine, Membership function, Support vectormachine
PDF Full Text Request
Related items