Font Size: a A A

Microrna Prediction Using SVM Based On Imbalance Dataset

Posted on:2014-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:J MaFull Text:PDF
GTID:2250330392964593Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
microRNAs (miRNAs) are single-stranded, endogenous~22nt small non-coding RNAs (sncRNAs) that can play important regulatory roles in animals and plants. It also has significance in the preventation of human disease and biological evolutional exploration. CDNA cloning technique is affected by time and tissue-specific miRNA expression. But computational approaches can well overcome these disadvantages, and have been widely appreciated. Compared with other methods, support vector machine can make miRNA prediction have higher performance. The number of miRNA precursors (pre-miRNAs) is far more than that of pseudo pre-miRNAs, which caused sample classification imbalance. The sample imbalance and optimal feature subsets selection can infulence miRNA classification, so this paper proposes miRNA prediction based on clustered sample imbalance, and miRNA prediction based on SVM-RFE-ReliefF, which considered the imbalance problem, feature selection and the application of support vector machine, in order to improve miRNA prediction performance.Firstly, analysis the sample classification problem of pre-miRNAs and pseudo pre-miRNAs, research the existing sample classification problem, this paper solves classification problem based on ensemble learning after clustered samples. This algorithm clusters pseudo pre-miRNAs, and then extracts1/9samples from every clustered subset so that there are nine different negative subsets. Respectively compose pre-miRNAs and nine subsets to construct nine subsets, which are used for ensemble learning. After trained and tested, use majority vote to predict unknown sequences.Secondly, considering the mutual complementarity between features, this paper uses SVM-RFE-ReliefF algorithm to select optimal feature subset. This approach integrates the evaluation criteria of SVM-RFE and ReliefF. In each iteration process, delete redundant features which weight values are calculated by SVM-RFE and ReliefF, obtain accuracy on current feature subset through k-fold cross validation. The feature subset which contains highest accuracy is the optimal feature subset. Finally, two miRNA prediction methods, proposed by this paper, are implemented by matlab2010a. They all use ensemble learning methods after clustered samples to solve classification problem, and use F-score and SVM-RFE-ReliefF to select optimal feature subsets respectively.
Keywords/Search Tags:microRNA, machine learning, support vector machine, ensemble machinelearning, imbalance classmcation, feature selection
PDF Full Text Request
Related items