Font Size: a A A

Design And Implementation Of The Prediction Of Pre-miRNAs And Mature Mirnas

Posted on:2012-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y C HuangFull Text:PDF
GTID:2210330362450436Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
MicroRNAs(miRNAs) are a set of endogenous non-coding RNAs with length of about 21nt. They regulate the expression of the target mRNAs and are involved in many important biological processes including metabolism, defense against viruses, apoptosis and proliferation. This paper proposed and implemented the pre-miRNA classification and miRNA prediction methods. These methods achieved higher prediction performance.First, given pre-miRNA candidates, the pre-miRNA classification model and the corresponding web service are implemented. To make efficient prediction, we extract the pseudo hairpin sequences from the protein coding sequences of Arabidopsis thaliana and Glycine max respectively. These pseudo pre-miRNAs are extracted in this study for the first time. The method of eliminating redundant features based on graph and the feature selection method based on the information gain are designed and implemented. At the same time, the representative pre-miRNAs are selected according to the pre-miRNA sample distribution as training samples. Our classifier PlantMiRNAPred achieved more than 90% accuracy on the plant datasets from 8 plant species.Second, given putative pre-miRNA, plant miRNA prediction model based on SVM and the corresponding web service are implemented. According to the biogenesis of miRNAs, a miRNA:miRNA* duplex is regarded as a whole to capture more characteristics of plant miRNAs. We extract the position-specific features of single nucleotide, the energy related features, the structure related features, and stability related features from real/pseudo miRNAs:miRNA* duplexes. A set of informative features are selected to improve the prediction accuracy. Two-stage sample selection algorithm is proposed to solve the serious imbalance problem between real and pseudo miRNA:miRNA* duplexes. The representative negative training samples are selected according to their distribution density in the high dimensional sample space and their prediction deviations. Our prediction method can accurately identify plant miRNAs. It achieves a significantly higher prediction accuracy compared with the existing methods. The prediction model is useful in providing the position of putative miRNAs to verify the new predicted plant pre-miRNA and miRNA candidates by biological testing.
Keywords/Search Tags:Pre-miRNA, Mature miRNA, Information gain, Feature selection, Sample selection
PDF Full Text Request
Related items