Font Size: a A A

MiRNA And MiRNA Target Prediction Based On SVM

Posted on:2010-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:D LuoFull Text:PDF
GTID:2178360272497186Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Bioinformatics as a hybrid subject that involves various traditional subjects such as biology,computer science as well as applied mathematics,emerges with the launch of the human genome project at the end of 1980s.In bioinformatics,researchers try to retrieve critical biological information through the analysis of nucleotide and amino acid sequence,so as to provide evidence in explaining the origin of life,and also to better understand the evolution and development process of life.To achieve this, various kinds of techniques need to be employed,such as data storage,data indexing, pattern classification and data mining.Recently,researchers have figured out the fact that not only the genetic information but also the expression regulation of genetic information is very important for organisms.RNA,as one of the most important genetic material,had long been considered merely as an intermediate auxiliary element during the transition process from DNA to protein.Recently,researches on small RNA molecules have received substantial attention amongst researchers.They have found out the fact that those small RNA molecules are responsible for manipulation of cell function,which in turn regulate the gene expression process.MicroRNAs(miRNAs),as a particular kind of such small RNA,which can be widely found in animal and plant,cleave or suppress translation of target genes by binding to their mRNAs.Further studies show that about 1/3 of human genes are regulated by miRNAs,miRNA molecules are one of the core components in the networks of gene regulation.Furthermore,miRNAs play essential roles in many biological processes,including the developmental processes,cell proliferation,death and fat metabolism,the cell differentiation and so on.Moreover,increasing evidence has demonstrated that miRNAs has strong ties with the formation of cancer.Therefore, research on miRNAs will be beneficial to understanding the construction of gene expression regulation network,in other words,gene function.Gene function studies, on the other hand,may have very strong impact on human disease control and biological evolution.In this paper,we focus on two hotspot issues related to miRNAs,namely miRNAs prediction,and miRNA target predictions.Our goal is to use machine learning techniques to tackle these two tasks,from the bioinformatics perspective. Specifically,we make use of the well known support vector machine approach.We design effective methods for feature extraction to improve the prediction accuracy for both tasks.We believe that our research will help to find more new miRNAs and their targets,and can also help to provide accurate and reliable data source for the study of miRNA functions and mechanism.Currently,a large amount of miRNAs have been identified through various methods.However,theoretically there exist more that have yet been identified.In order to reveal the mysteries of miRNAs,it is important for us to further explore new miRNAs.Mature miRNAs consist of about 20 to 24 nucleotides,and are processed from pre-miRNAs,which have the characteristic of stem-loop hairpin structure. During the biogenesis procedure,the hairpin structure of pre-miRNAs is essential for the miRNA formation.Therefore,miRNAs can be predicted by distinguishing true pre-miRNAs from faked ones.However,according to the current studies,a large amount of similar hairpins can be folded in many genomes,which makes the identification of the pre-miRNAs more difficult.To date,machine learning method has been widely used in the pre-miRNAs prediction tasks,and feature extraction is a key step for miRNAs prediction.However,due to inadequate feature extraction process, the experiments yield low precision as well as limited recall.In this paper,based on the current feature extraction method,we developed an improved method for pre-miRNAs prediction using support vector machine(SVM).The method is named PMirP,which includes many hybrid features,such as structure-sequence characteristics which are extracted from the pre-miRNA stems,the free energy and the number of nucleotide-matching in pre-miRNA stem.At the same time,we firstly proposed an important feature,two free nucleotides in miRNA:miRNA~* double helix structure,to predict miRNAs.The optimal parameters for the RBF kernel were selected based on the training set.The learned parameters together with the training set are then used to build the SVM classifiers.The learned classifiers were then applied to several separate testing sets for evaluation.The experimental results showed that PMirP not only effectively identified true pre-miRNAs in human beings,but was also able to predict pre-miRNAs for other species with high accuracy.Compared with existing methods, PMirP enhanced the sensitivity and specificity significantly.Therefore,PMirP is effective in miRNA prediction.Lastly,we have published our method through a web service for scientific research purpose.miRNAs act by binding to the complementary sites on the 3' untranslated region of the target gene to regulate gene,so identification of miRNA targets is the basis for the research on miRNA functionalities.Generally,miRNA molecules are very short in length.In practice there exist many possible genes that can be complementary to them in the whole genome.In addition,only a small number of miRNA have been confirmed,which make it hard to find target mRNAs.Recently,bioinformatics plays a dominant role in the miRNA targets prediction task several methods have emerged,but these methods are mainly based on sequence complementary in seed region and they still have limitations in revealing actual target genes.The introduction of machine learning techniques to the miRNA targets prediction has been shown to be successful and the prediction accuracy has been improved accordingly.In addition,the machine learning technique,when incorporated with the new biological characteristics is the important method for improving prediction accuracy.In this study,based on the key features,we propose a method for miRNA targets prediction using support vector machines.The out-seed segment of the miRNA:mRNA duplex sequence can compensate for imperfect base pairing within the seed segment,in this method,the feature is considered.Consequently,we have partitioned the duplex into two parts:the seed and out-seed for feature extraction.The nucleotide position of seed region,the MFE and others act as features in the target gene prediction.The method is similar in process of PMirP,such as selecting the optimal parameters to build the SVM classifiers, testing the classifier with the test dataset.The experimental results show that the method yields high performance on the task of targets prediction.Furthermore,the method has high sensitivity and specificity,which validates the feasibility and efficiency of the method.Human miRNAs research is still in its infancy.To date,most of the existing research based on the combination of known miRNA characteristics for prediction application,which requires us to have better understanding on biological characteristics in miRNAs and miRNA targets.For example,structure,sequence, pathway and so on.In this paper,based on the Support Vector Machine,we develop two types of prediction methods,which are verified with good results respectively. Thus,this study not only provides some effective means for the research on miRNAs and miRNA targets,but also gives some solid foundation for future research in this field.
Keywords/Search Tags:miRNA, miRNA target, prediction, SVM
PDF Full Text Request
Related items