Font Size: a A A

Applications Of Pattern Recognition In Bioinformatics

Posted on:2016-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:S L JiaFull Text:PDF
GTID:2180330461456894Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Pattern recognition is a basic ability of human. With the appearance of the first computer in the true sense in 1946 and the rise of the artificial intelligence in 1940 s,scientists were expecting to use computers to replace or extend the mental part of human brain. Pattern recognition developed quickly and became a new discipline in 1950 s. It is concerned with the development of systems that learn to solve a given problem using a set of example instances, each represented by a number of features. These problems include clustering, the grouping of similar instances; classification, the task of assigning a discrete label to a given instance; and dimensionality reduction, combining or selecting features to arrive at a more useful representation. The use of statistical pattern recognition algorithms in bioinformatics is pervasive. This thesis, is based on the correlation measurement algorithm MIC(Maximal Information Coefficient) and classifying algorithm in the pattern recognition, focusing on how to use pattern recognition algorithms to accurately and efficiently solve some problems in the bioinformatics.1) Similarity Measurement is an important concept in pattern recognition, in order to identify reasonable correlation, we must describe the degree of closeness or distance between samples. Characterizing the degree of closeness or distance between sample points are the following two types of function: similarity coefficient function and distance function. MIC is a kind of similarity measurement functions. In 2011, Professor Reshef proposed MIC in a paper plublished on the famous scientific magzine "Science". MIC captures the complex relationship drived by a variety of factors without acquiring the prior understanding of the relational model captured. However, testing MIC on biological data,we found that the implimentation application of MIC, MINE(Maximal Information-based Nonparametric Exploration) is not always convergent to real MIC values, and possesses large degeneracy in mathematics. Therefore, in order to facilitate the accurate calculation of MIC, we developed an algorithm called SIG(Simulated annealing, Interpolation and Genetic), proved the convergence of SIG based on Markov theory and implemented the algorithm used in the Bioinformatics.2) Non-coding RNA(ncRNA) refers to the RNA does not encode proteins. The discovery and functional annotation of ncRNAs are becoming a focus of recent Bioinformatics research. However, the genome-wide identification of ncRNAs with highaccuracy and high coverage is a challenging task. Here, we developed an alignment-independent method, called ncRSOF, which identifies ncRNA based on k-mer String and ORF Features, tested on a large biological database, our ncRSOF is very fast,and can achieve 96% accuracy and coverage.
Keywords/Search Tags:Pattern recognition, Bioinformatics, maximum information coefficient, Genetic, non-coding RNA prediction
PDF Full Text Request
Related items