Font Size: a A A

Study On Intelligent Computation Based Prediction Of Membrane Protein Structure And Interaction

Posted on:2011-08-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:P Y ZhaoFull Text:PDF
GTID:1100360302980220Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
After obtaining genetic data, the most direct way is to conduct studies of protein structure in order to analyze all the gene function and clarify the expression patterns and biological functions of proteins, especially the the proteins expressed by the genome and used to implement the life activity. In the specific study of membrane protein structure and function, the prediction of membrane protein types is the important foundation. However, it can not meet the demand for the increasing membrane protein sequences using molecular biological methods to predict membrane protein types. Given an amino acid sequence, what features should be derived from it and how to formulize these features so as to represent the relationship between the sequence and the structure or function of the corresponding protein correctly? In other words, characteristic description of the amino acid sequence requires further study. In this thesis, combining intelligent computing technologies, the information of membrane protein sequences is mined in order to better understand the relationship between the membrane protein sequences, structure and function. Besides, more and more large-scale genome sequencing provided us not only additional membrane protein sequences, but also conditions for the study of membrane protein interactions. Membrane protein interactions play an important role in the life activities. They provide not only clues for the annotation of the unknown biological functions of membrane proteins, but also necessary information for study of membrane protein structure and understanding of the mechanisms of life activities.In this thesis, we study the structures of membrane protein based on the sequences. We mainly focus on two areas: the prediction of membrane protein types and prediction of membrane protein interactions. Using pseudo amino acid composition theory and the approximate entropy algorithm, optimizing parameter combination, according to different combinations of parameters of the formation several different types of classifiers are built, then we ultimately construct a classifier by integrating the different basic ones. The integrated classifier is used for predicting membrane protein structure classes. Besides, we establish fuzzy support vector machine network to classify membrane proteins by combination of bio-physical properties of them.In the study of membrane protein interaction, we collect more positive samples, extract features of membrane protein interactions through the experimental data, and use fuzzy support vector machine algorithm to identify membrane protein interactions. By creating additional data set, we use different feature representation methods and apply AdaBoost algorithm to integrate multiple weak classifiers to predict membrane protein interactions. The main contributions in the thesis are described as follows.In the prediction of secondary structural classes of membrane protein, first, we use pseudo-amino acid composition theory to describe membrane protein sequences and the additional sequence information is computed with approximate entropy method. Next, we establish a number of different classifiers according to the different parameter settings using the optimized weighting factor. Then we integrate a number of fuzzy K nearest neighbor classifier, and after training and testing we apply integrated classifier to predict membrane protein structural classes. Jackknife tests on the datasets show that the method is effective and practical.In the process of classification using traditional support vector machine algorithm, unclassifiable regions exist. In order to resolve the problem, we introduce the fuzzy membership function to constitute a fuzzy support vector machine classifier and then integrate multiple classifiers to build fuzzy support vector machine network. Combining with the information of physical and chemical properties of membrane protein sequences, the network is used to predict membrane protein structural classes.As the hydrophobic characteristics of membrane proteins, its structure data in the database occupies a very small proportion. Experimental methods for membrane protein interactions are more difficult, so the known data about membrane protein interactions is very little. In this paper, we use fuzzy support vector machine algorithm to identify unknown pairs of membrane proteins. We collect more data on the positive samples and extract interactive features with the experimental data. The algorithm is proven to be effective.AdaBoost principle is that the samples that a weak learner can not well study will be the samples that the next weak learner focus on as far as possible. Therefore, we apply the AdaBoost algorithm for integration of multiple weak classifiers, test on different data sets and take different ways to extract the characteristics of membrane protein interactions in order to obtain better feature representations. Application of integrated classification system to classify and predict membrane protein interactions achieved good results.At last, a summary of the thesis is made, and the deficiency in the project and the further development are narrated respectively.
Keywords/Search Tags:membrane proetin, membrane proetin structural class, membrane proetin secondary structure, fuzzy K nearest neighbor algorithm, fuzzy support vector machine, AdaBoost
PDF Full Text Request
Related items