Font Size: a A A

The Application Of Several Machine Learning Methods In RNA Structure Predicton

Posted on:2013-12-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Y FuFull Text:PDF
GTID:1228330467982739Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the successful implementation of the Human Genome Project, the demand of obtaining usable information on genomic sequences is rigidly requested. A new emerging discipline, bioinformatics, makes full use of Information Technology, Mathematics, biology, computer science, etc., multidisciplinary tools and efficiently analyses the biological meanings of data. Meanwhile, machine learning methods provide reliable and powerful algorithm support for bioinformatics, and help bioinformatics makes great contribution to exploration and research of life sciences. Using several machine learning methods in bioinformatics, this paper investigated problems of RNA secondary structure prediction and miRNA target matching pattern and target prediction.According to the fact that the complexity of comparative methods is higher than that of minimum free energy method of a single sequence, this paper puts forward a hypothesis:the most of base pairs in the RNA conserved secondary structure come from the base pairs in the minimum and suboptimal free energy structures. Several of feature tributes of stem substructures from every sequence minimum and suboptimal free energy structures are assigned, and then conserved base pairs are obtained based on Support Vector Machine method. By experimental validation on benchmark data and10sets of RNA families from Rfam database, it shows the performance of our algorithm is not inferior to that of mainstream commonly used algorithms.For better presentation of relative positional relation between different stem substructures, this paper puts forward a novel representation of RNA secondary structure, contracted dot-plot representation, on the base of which, those RNA secondary structure prediction algorithms with granularity of stem would be described more vividly and profoundly.Intermolecular targets between miRNAs and mRNAs are mainly based on RNA primary sequence base pair matching information. As for base pair matching feature extraction, a rough set method and a Markov model are presented for drawing qualitative and quantitative pattern respectively. In order to analyze matching pattern of 22bases of miRNA binding, two measures are firstly adopted, and then rough set tool is used. Matching pattern of miRNA target sequence has further been quantitatively analyzed. By building a Markov chain model, machining learning is executed using training data. The model parameters are estimated based on theory of maximal likelihood method. The model with estimated parameters is a maturity model. A setting of a proper threshold is based on maturity model score. Matching patterns whose scores are higher than the threshold are identified as valid patterns. And the score are further used for assessing whether a candidate target is a truly target.With the widespread use of gene microarray profile data in recent years, prediction based on gene expression profile data provides an effective high throughput experiment mode. A Bayes network is presented for modeling mutul relationship of expression data of miRNAs, mRNAs and proteins. By using our transformed model, not only is miRNA target predicted, but also regulation mechanism of miRNA is discerned.
Keywords/Search Tags:Support vector machine, rough set, Bayes network, Markov model, RNA secondary structure, notation of RNA secondary structure, miRNAtarget prediction
PDF Full Text Request
Related items