Font Size: a A A

MicroRNA Prediction Based On Machine Learning

Posted on:2018-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:L ShiFull Text:PDF
GTID:2310330542990804Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
MicroRNAs(miRNAs)are short conservative nucleotide sequences,which play an important role in gene transcription regulation.It has been a hot spot in current research quantifying the microRNA sequence characteristics,and using machine learning techniques to predict microRNA.However,many existing methods ignore the problem of imbalance classes existing in the microRNA prediction process,which make the model over learn many classes and reduce the classification performance.Also,these methods are usually effective in just narrow range of species.This thesis aims to establish a general microRNA prediction model between species,broaden the scope of available species and reduce calculation time.A general microRNA prediction model between species,which combines the thought of ensemble learning,using five SVM classifier and three steps such as sampling,feature selection and classifier parameters optimization,was built in this thesis.Firstly,to solve the imbalance data set of microRNA,a stratified sampling algorithm based on sequence entropy was proposed in this thesis,which can keep samples on the basis of the overall distribution of sampling and generate positive samples and negative samples quantity balance training set.Secondly,in view of the problems of the high dimension of the training set,a large sample size classifier and slow training speed,the feature selection algorithm based on SNR and correlation is put forward,which reduces the training set size in order to improve the training speed.Finally,this thesis presents a DS-GA algorithm,used for shortening the time of the SVM classifier parameters optimization and achieves the goal of reduced over fitting.The experiments and comparative experiments are based on microRNA sequence data sets and public data sets,which verify the effectiveness of proposed algorithm.The model is built on microRNA sequence data sets,and the test set shows that compared with other forecasting methods,the model has higher accuracy.The experimental results show that the general prediction model of microRNA built in this thesis greatly makes sense,which provides reference for further validation of possible microRNA.
Keywords/Search Tags:Sampling, feature selection, imbalance class, microRNA
PDF Full Text Request
Related items