Font Size: a A A

Recognition Of Exon And Intron And Regression Prediction Based On Support Vector Machine

Posted on:2008-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y S ZhangFull Text:PDF
GTID:2178360218953853Subject:Agricultural Entomology and Pest Control
Abstract/Summary:PDF Full Text Request
Support vector machine (SVM) include suppose vector classification (SVC) andsupport vector machine regression (SVR), which was put forward in 1995 and developsmost rapidly among all of machine learning methods. SVM is based on the principle ofstructural risk minimization, and then resolves such practical problems as nonlinearity,over-fit, curse of dimensionality, local minima, small samples learning and has highgeneralization. This paper studies the recognition of exon and intron, multidimensionaltime series prediction in longitudinal data and quantitative structure-activity relationship innonlongitudinal data by improving SVM. The main contents and results are as follows:1) A novel method of feature extraction named multi-scale component and correlationwas proposed and was applied to the recognition of exon and intron based on Fisherdiscriminant or SVC. And the exon sensitivity (S_n), exon specificity (S_p), intron sensitivity(S_q) and correlation coefficient (CC) from test sets are 0.9240, 0.9893, 0.9900 and 0.9160respectively. The method had the advantages of simple algorithm, high accuracy, wideusage, etc.2) Based on SVR and controlled autoregressive (CAR), we proposed a new non-linearmultidimensional time series method named SVR-CAR, that can show the dynamiccharacteristics of sample set as well as the effect of environmental factors. To evaluate theperformance of SVR-CAR, we compared its predictions with those of four othercommonly-used methods, using two sets of longitudinal data and one-step prediction. Theresults showed that SVR-CAR had the highest accuracy in prediction among the fivemethods. SVR-CAR has the potential to be widely used for predictions involvingmultidimensional time series data in ecology, agricultural sciences and economics.3) To improve the predication precision in quantitative structure-activity relationship(QSAR), a novel nonlinear combinatorial forecast method based on SVR and k-nearneighbor group was proposed. The prediction results of nonlongitudinal data of QSAR forsubstituted anilines and phenols to Daphnia magna Straus showed that the novelcombination method had the highest prediction precision among the ten methods andcharacterized the nonlinear relationships between the toxicity among the descriptors subtly. It had many advantages such as structural risk minimization, non-linear screeningdescriptors and submodel, non-linear combination prediction, automaticly optimizing thekernel function and its parameter and so on. The novel combination model, so, can bewidely used in QSAR.
Keywords/Search Tags:suppose vector machine, recognition, multidimensional time series, quantitative structure-activity relationship, prediction
PDF Full Text Request
Related items