Font Size: a A A

Research On Pre-miRNA With Multiple Stem-loops Prediction Based On Computational Intelligence

Posted on:2017-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:G Q YuFull Text:PDF
GTID:2310330488468645Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
It has been a long time that the comprehension of biological central dogma is that DNA information is transcribed to mRNA that acts as a template to produce proteins. But the discovery of microRNA(miRNA) has changed people's original cognitive model. The miRNA is an important short(about 21~23 nt) type of non-coding RNA(ncRNA) gene and can decide to suppress or crack the target mRNA translation by complementary base pairing rule, so miRNAs can affect the biological gene expression. The latest research found that about 20%~30% of human genes are regulated by corresponding miRNA gene. miRNA plays the regulated role in physiological metabolism, body growth and development, cell proliferation and apoptosis and so on. Moreover, experiments have demonstrated that miRNA has a complex relationship with the occurrence of cancer. Therefore, research on miRNA will be beneficial to comprehending the principle of gene regulation network and has a certain guiding function in the exploration of biological evolution.Our research mainly includes the following four aspects:(1) We extract 695 human pre-miRNAs from data base miRBase, then the redundant pre-miRNAs are deleted, this retains 691 non-redundant pre-miRNAs. On the other hand, we collect 8494 non-redundant pseudo hairpin sequences which are obtained from human RefSeq genes. Then we adopt the manually annotated human non-coding RNA data base, which was originally established by Lander. The original human non-coding RNA data base included 1020 human non-coding RNA sequences(except miRNA). After removing the redundant and length longer than 150 bases non-coding RNA sequences. In the end, there are 754 non-coding RNA sequences. The main problem encountered in our established dataset is it,s imbalance, therefore, we apply data preprocessing methods and internal methods to make our dataset balance.(2) We first consider the 29 global and intrinsic features that were used in the prediction method miPred which achieved the best prediction effect. In addition to the above features, we adopt new 19 physicochemical and structural features. Selecting the most discriminative features can increase the predictive efficiency of our prediction model by reducing it,s complexity. So we apply wrapper methods and filter methods to select the optimal feature(3) subset from 48 features. The last optimal feature subset retains 21 features that includes 7 miPred features and 14 newly introduced structural features. These findings also demonstrate that the newly introduced structural features have higher discriminative power for separating pre-miRNAs from negative sequences than the sequential features.(3) For artificial neural network has self-organizing, self-learning and adaptive advantages, so we firstly select artificial neural network prediction model to predict the miRNA. By applying systematic 5-fold cross-validation method, the accuracy of our experiment is 93.58% and apparently higher than other prediction methods, such as triplet-SVM and MiPred.(4) When neural network prediction model is used to validate 6095 non-human animal pre-miRNAs and 139 virus pre-miRNAs from miRBase, our prediction model achieve 92.71% and 94.24% recognition rates respectively. The prediction effects have been greatly improved, which proves that our constructed artificial neural network prediction model is effective and provides a new idea for miRNA prediction.
Keywords/Search Tags:pre-miRNA, SMOTE, feature selection, neural network, flexible neural tree
PDF Full Text Request
Related items