Font Size: a A A

MicroRNA Prediction Using SVM With Basic-N-Units Feature

Posted on:2011-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:H YangFull Text:PDF
GTID:2120360305955159Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
RNA, as one of the most important material of a biological body, DNA and proteins constitute the framework of life together. MicroRNA is a kind of non-coding RNA with 19~27 nucleotides in length. It is produced by part of the complementary of double-stranded precursors(pre-miRNA) which is generated by animals and plants genome coding region. MiRNA is splitting or inhibiting the translation of target mRNA to achieve the purpose of regulating endogenous genes. It is not only involved in a series of important life processes, including early development, cell proliferation, apoptosis, cell death, fat metabolism and cell differentiation, and also there is close contact with cancer and other diseases. In other words, the study of miRNA is to understand in-deapth the regulation of the relationship between genes, while there is of great significance to gene function research, prevention and treatment of human diseases and biological evolution of exploration.In recent years, the development of Bioinformatics played a decisive role in the evolution of miRNA prediction method. It is an emerging discipline that is formed by cross-cutting of biology, computer science, applied mathematics and many other subjects. Bioinformatics use machine learning method to explain the nature of life origin, evolution, developmental through extracting the inherent biological significance of this data by storing, classifying, retrieving and analyzing the nucleotide and amino acid sequences. Machine Learning is one of the most intelligent features and the most leading edge research of the computational intelligence, but also the core of artificial intelligence. Under the leadership of machine learning, more and more miRNA prediction methods based on machine learning are being produced. After joining the prediction team ranks of support vector machine, especially, the results of predictions are greatly improved. So, this article also uses method for miRNA prediction based on support vector machine.At this stage, a large number of miRNAs have been identified through a variety of ways. But the number of miRNAs discovered on the whole is far less than that in the forecast level of theory. In order to further explore and reveal the mystery of miRNA, we need continue to search for new miRNAs. As the pre-miRNA sequence and structure has a strong conservative property and its secondary structure appearing hairpin type, the identification of the mature miRNA research turns into identifying the authenticity of their precursors. However, there is a large number of similar stem-loop structure sequence in the genome sequence fragments. Therefore, it is very difficult to identify the real pre-miRNA from these sequences. Nowadays, pre-miRNA prediction method varied from the initial purely biological sequence alignment of homologous fragments to the prediction methods of scoring based on sequence and structural characteristics, until now machine learning prediction method which is widely adopted by bioinformatics. During this period pre-miRNA prediction means has undergone a qualitative leap.Using machine learning methods, it is the important means to improve the prediction accuracy that improve the existing relevant algorithm by combining with the newly discovered biological characteristics. However, new problems also occurred: the features extracted of the existing forecasting methods are limited, and the range of species predicted is smaller. Machine learning methods are a lot more intelligence than other previous methods, of course, but this process which is less human involved in appeared a bit too rigid and inflexible. Besides, the study objects of each method are quite restricted, some methods can only predict the one of animal, plant and bacteria; some can predict pre-miRNAs with a single loop or multi-loops separately; some other methods can only even predict a number of designated species. Almost all approaches are basic unlikely to achieve forecast all different species together. It is very difficult to achieve a qualitative breakthrough. Therefore, based on the current level of knowledge, the research of miRNA is not enough.In this regard, this paper has been carried out further research on the way of feature extraction. We have designed and implemented a IOT-SVM model with a BNU feature to predict pre-miRNA. This method mainly made improvements in two key areas of the process of the pre-miRNA prediction: the improvement based on coding algorithm of pre-miRNA characteristics and the improvement on exploring an effective new feature Basic-N-Units. For these two areas of research, this article designed the improvements of precursor coding and an effective new feature extraction method from the perspective of bioinformatics, perspective. The former made an effective expansion to the scope of object predicted. The latter make significant contributions to improve the results of prediction. We have made mainly three small improvements on pre-miRNA coding method. They are precursor processing with multi-loops, the selection of nucleotide coding sequences and processing of 3 'end coding. For multi-loops processing, the different places from original triples coding are mainly reflected in the process of algorithm implementation. As there are more loops, we should identify the terminal loop through the left bracket and right bracket while scanning. If there are multiple points between the left bracket and the right bracket, then it indicated that this part is a terminal loop. For the selection of coding sequences, coding strategy in IOT is improved the original coding which is extracting the middle of nucleotide sequences into extracting the left of the nucleotide sequence of structural units combined with their encoding. In the structure of miRNA: miRNA * double-helix, there are two free nucleotides in each end of 3' chain. It is included in the structure of the stem as a 3' terminal coding processing. In addition to these coding improvements mentioned above, we also designed a new feature according to pre-miRNA stem matched continuity to model the Basic-N-Units. This feature is the first feature with parameters. It can choose different parameters according to different species predicted, as to achieve a better prediction.Experimental results show that, the IOT-SVM model method based on BNU features applied to the pre-miRNA prediction of 11 plant species, and the average accuracy rate reaches 97.24%. At the same time, it also gets a very good sensitivity and specificity, respectively 99.19% and% 98.27%. Compared with other existing methods on animal data, its sensitivity, specificity and accuracy have increased correspondingly, and the overall prediction results are good. This has fully proved the effectiveness and superiority of forecasting. In summary, the IOT-SVM model based on BNU features approach can identify plant pre-miRNA sequence more accurately, and it can be extended to the animal species on the pre-miRNA prediction. This method can be used to the process of pre-experiment in prediction analysis, and it will have a high application value.MiRNA research by people is still in its initial stage. Recently, most research is to predict on application through the combination of bioinformatics methods and some known characteristics of miRNA. If we want to get a qualitative breakthrough, the researchers need to realize root of the problem, based on the formation and the mechanism of miRNA from a biological point of view, and fully grasped more information of the structure, sequence, and metabolic pathways etc, so that they can find the fundamental biology features of miRNA. This paper proposed a prediction method based on Support Vector Machine. And this method achieved good results through experimental verification. I believe that our research will have contributed to the discovery of new miRNA. It will not only provide reliable and accurate data sources for studying the mechanism of action and its function, but also have laid a solid foundation for the follow-up analysis.
Keywords/Search Tags:Pre-miRNA, BNU, Prediction, SVM
PDF Full Text Request
Related items