Font Size: a A A

Research Of Alternative Splicing Sites Prediction Based On Machine Learning Method

Posted on:2016-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:Q H GuoFull Text:PDF
GTID:2180330470956139Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The research on genome expression and function is one of the main goal of bioinformatics.Eukaryotic RNA splicing mechanism is a complicated process which may affect the gene expression. Splicing mechanism is of great significance in the research of RNA. Different splicing process will leaddiverse products. In order to predict the splice sites, a more accurate method is demanded.Machine learning is an important research point in the field of intelligent computing, whichis differedfrom Data Mining technology.In addition to the knowledge learning, machine learning also needsto use the existing knowledge to improve its performance.Machine learning methods are used for studies of RNA splicing sites.Those methodsare more intelligent and accurate than the traditional methodson prediction.This article is mainly about the combinationmethod of the second-order markov model with SVM in the application of alternative splicing sites prediction problem, the main idea is to convert the splice sites prediction into the classification of the true or false sites by sequence features nearby.The contribution of this research are shown below:1、The data is selected from the ASD database and the AS-ALPS alternative splicing database, and Sequence data sets were picked for eachtype of alternative splicing. Moreover, the sample data sets were formed by a certain length sequence from the upstream and downstream of each splice site, and it should be preprocessed.2、Feature extraction based on a second-order markov model method which variables were normalized.Thenselect the primary features such as duplex bases rule of splicing sitesby analyzing, in order to constructthe feature vectors of classification problem.3、ASVM-based methodis used to classify the sample data, which contains the improved sample sets density and membership calculation.To minimizing the negative effect on the prediction accuracy of the noise samples,andthe penalty factor for each class was introduced.The experiment resultshas shown that in each variant predictionof alternative splicing sites, the method used in this article achieves a better performanceof prediction accuracythan the traditional algorithms and simple machine learning methods.
Keywords/Search Tags:Alternative Splicing of RNA, Predictionof Splice Site, FeatureAnalysis, Markov Models, Support Vector Machine
PDF Full Text Request
Related items