Font Size: a A A

Design And Implementation Of Non-coding RNA Prediction Based On Data Mining

Posted on:2005-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuFull Text:PDF
GTID:2168360122489283Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Biocomputing or bioinformatics is a new research field that uses knowledge and algorithms of computer science to process and analysis data of biology data from bio-applications. With the rapid increase of biology data, more attentions have been put on how to use efficient computer algorithms to deal with these data.This thesis studies on biocomputing field, which specifically focuses on prediction of non-coding RNA (ncRNA). Theory and technique of data mining were introduced to set up a computational framework that is able to distinguish ncRNA from other kinds of sequences, and prediction software was designed to provide non-professional users with an easy prediction tool.This thesis lays a strong emphasis on principal components analysis (PCA) and LM algorithm in artificial neural networks to complete the prediction of ncRNA, which is based on data mining. First, features of ncRNA were summed up to serve as input for data mining procedures later on. Second, statistics toolbox and artificial neural networks toolbox of MATLAB were used to carry out principal components analysis and artificial neural network training. Finally, user prediction was designed in Visual C++ while MATCOM served as interface between MATLAB and VC to complete the user prediction program.Difficulties of this thesis are extraction of ncRNA features and the selection among data mining techniques. Domain knowledge in biology was used to distinguish effective ncRNA features that can be relied on for data mining on this specific purpose. The method adopted here is proved effective by prediction results obtained at this stage.The new ideas in this paper are as following: 1 a new prediction method was implemented on classification for ncRNA sequences and software was designed to help user prediction; 2 LM algorithm in artificial neural networks was introduced to achieve faster convergence in training; 3 a strategy that combines MATLAB with VC was implemented to suit users from diverse backgrounds.
Keywords/Search Tags:biocomputing, data mining, artificial neural networks, principal components analysis
PDF Full Text Request
Related items