Font Size: a A A

Microrna Target Gene Prediction Using Support Vector Machine With Ensemb Lelearning

Posted on:2016-01-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z R ChenFull Text:PDF
GTID:1220330479450987Subject:Instrument Science and Technology
Abstract/Summary:PDF Full Text Request
Micor RNAs(mi RNA) are a family of single-stranded non-coding RNAs with about 22~25 nucleotides in length, which have phosphorics acid and a hydroxyl respectively in 5’ region and 3’ region of a mi RNA. They play important roles in post-transcriptional regulatory functions through complementary base pairing interaction in 3’UTR of message RNA(m RNA). Experimental investigation shows that mi RNAs are widely present in plants and animals, involved in cell growth, development, differentiation, metabolism, and other important life activitie s. Mi RNA target recognition is a key and important part in researching and analyzing mi RNA molecular biology function. It’s also the key to the study of mi RNA.mechanisms’.As the sample data of mi RNA target are unbalanced, which lead to the lower prediction accuracy of positive samples and poor overall classification results, this paper proposes a target prediction algorithm based on Support Vector Machines(SVM), in which under-sampling technology is embedded into Ensemble Learning. The algorithm can effectively improve the classification accuracy and generalization ability of mi RNA target prediction model. This paper studies the three issues: Feature selection method based on dataset, ensemble learning model with a combination of under-sampling and mi RNA target prediction model based on kernel parameter optimization.Firstly, mi RNA:target binding characteristics of the structure as well as the region has been studied. 9 kinds of mi RNA target identification rules and the quantitative criteria of features have been proposed. Based on rules of mi RNA target identification, we extracted 90 features on dataset by perl language.Secondly, the performance of mi RNA target prediction model built on 90-dimensional feature vector set has been analysed. The feature selection algorithm SVM-FSCI based on classification gap has been proprosed. The algorithm defines features’ effective rate based on classification of SVM. It sorted the original 90-dimensional feature vector set with the features’ effective rate, and remove redundant and inefficient features in order to find the best feature subset. Experiments show that mi RNA target prediction models built on the optimal feature set achieved a good result.Finally, this paper proposes a target prediction algorithm—SVM-IUSW, in which under-sampling technology is embedded into Ensemble Learning. The algorithm uses SVM algorithm as the basic learning algorithm. While Ada Boost is used for the integration framework, under-sampling based on clustering is embeded to reduce the degree of unbalanced distribution of positive and negative samples within the iterative process. In order to avoid over-learning,the algorithm also fuses robust sample weights smoothing mechanism so as to eliminate the abnormal samples in negative sample at the same time. Finally, predictions of multiple sub-classifiers combines as a result of the mi RNA target integrated classifier by weighted voting mechanism. The experiments show that, SVM-IUSW algorithm can obtain better classification and generalization performance than the current popular machine learning algorithms.
Keywords/Search Tags:mi RNA target genes, SVM, unbalanced data, integrated learning, feature selection
PDF Full Text Request
Related items