Font Size: a A A

Data Mining And Its Application On Protein Structure And Function Prediction

Posted on:2014-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2268330401974776Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the accomplishment of Human Genome Project and the rapid development of modern biological science and technology, more and more biological sequence data are emerging at an explosive pace. How to turn these data to knowledge has become a hot topic in bioinformatics research. Although the structure and function of protein can be determined by means of conventional experiment, it will cost much labor power and material resources, and the accumulating rate of the protein sequence information is far quicker than that of the protein structure information. Thus, it is of great theoretical and practical significance to develop effective methods for predicting the structure and function of protein based on protein primary structure.Based on the large amounts of protein sequence data, this paper focuses on the application of probabilistic neural network, sequence alignment methods, chaotic artificial bee colony algorithm, multiple classifiers combination and other popular data mining methods to biological information processing. We have studied several problems in protein structure and function prediction, and some novel methods have been presented. The main contents are listed below:1. A novel method for the prediction of protein three-dimensional structure is developed by chaotic artificial bee colony algorithm. It combines artificial bee colony algorithm and chaotic search algorithm based on the off-lattice AB model. In the searching process, if the algorithm gets into the local optimum, we utilize chaotic variables to make it jump out of the local optimum solution. So chaotic artificial bee colony algorithm not only has global search and local search abilities of artificial bee colony algorithm, but also avoids premature convergence and local optimum with chaotic search algorithm, which can realize the global optimization. The experiments carried out with the popular Fibonacci sequences demonstrate that the proposed algorithm provides an effective and high-performance method for protein structure prediction.2. A novel method based on data dividing and integration for predicting signal peptides. As the length of signal peptide sequence is different and the composition of amino acid is diversity, most of existing methods in literature for signal peptides prediction employ scaling windows to deal with these problems, which lead to potential loss of useful information and imbalanced data problem. In order to improve the prediction performance of the class with minority samples, data preprocessing is used before employ traditional probabilistic neural networks to build classifiers:the class with majority samples was divided into several groups, and then several data subsets are respectively constituted by combining each group with minority samples, which used to train probabilistic neural network classifiers. The ensemble system finally combines results through ballot from a series of classifiers worked on two different coding of proteins sequences. The experiments carried out on the popular Neilsen dataset show the effectiveness of the proposed algorithm.3. A novel method based on the local matching similarity of alignment is proposed to predict signal peptides. Considering the signal peptides is a local sequence fragment of a protein that embodies its biological characteristics, the methods of the local sequences alignment have effectively applied to the prediction for signal peptides. We adopt the relatively hydrophobic of amino acids to represent protein sequences and search local matching subsequences. We use the substitution matrix BLOSUM62to measure the similarity between two proteins, and than obtain the final results by k-nearest neighbor algorithm. The experiments carried out on the popular SwissProt dataset show the effectiveness of the proposed algorithm.
Keywords/Search Tags:bioinformatics, protein three-dimensional structure, signal peptides, chaoticartificial bee colony algorithm, local sequences alignment, probabilistic neural networks
PDF Full Text Request
Related items