Genome sequences rich in biological knowledge and biological principles. Withthe development of Human Gene Groups (HGP) and the fast increasing pace of thegenome-sequencing projects, biologists have got genome sequences of hundreds ofspecies. Recognition of protein coding genes is the first problem in genome analysisafter the genome-sequencing. This paper describes some new approaches forrecognition of protein coding sequences, especially short coding sequences, andanalyzes it from graphic features and classification algorithm.According to base bias in the three positions of codon and base chemicalproperties, new graphical representations of gene sequences, are introduced forrecognizing short coding sequences of human genes. Nine effective features of areamatrix are extracted in the new curves and Support Vector Machines (SVM) is used toidentify the short protein coding sequences in human genes. In the process ofidentifying, the incremental feature selection algorithm is used to add four statisticalfeatures to express more information and improve the accuracy. Then PrincipalComponent Analysis (PCA) is worked for reducing dimensions. Finally, theexperimental results show that the method uses fewer features (seven or four) and getsbetter recognition results than other methods.Traditional Support Vector Machine (SVM) is sensitive to isolated point andnoise data, and has huge calculation. To improve this weakness, Least Squares FuzzySupport Vector Machines (LS_FSVM) is applied for classifying the coding/uncodingsequence instead of SVM. A new calculation method of the sample membership forLS_FSVM is proposed, in which the relation of samples has been taken into account.Compared with SVM and Least Squares Support Vector Machines (LS_SVM), thismethod obtains better recognition accuracy. |