Font Size: a A A

Research On Relevant Problems Of Discriminating Sequences And Structures Of Outer Membrane Proteins

Posted on:2009-03-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:L Y ZouFull Text:PDF
GTID:1100360305982432Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Proteomics occupies one of main fields of bioinformatics research. The research on membrane proteins takes a remarkable station in proteomics, because of the importance of membrane proteins as drug targets for disease treating and as main functional components in boimembranes. As an especial family of membrane protein, outer membrane proteins (OMPs) reside in the outer membranes of gram-negative bacteria,chloroplasts and mitochondria, and a majority of them fold into beta-barrel structures by 8-22 beta-strands, and compose themselves to two transmembrane protein types together with alpha-helical membrane proteins. OMPs perform a variety of functions, such as mediating non-specific, passive transport of ions and small molecules, selectively allowing the passage of molecules, and are involved in voltage-dependent anion channels. Further, OMPs relate to bacterial adhesion, toxicity release and immunity, and so are becoming valuable drug targets for anti-gram-negative bacteria. Discriminating sequences and structures of OMPs are keeping challenges because of difficulties in experimental validation and structural resolution of them. Various computational approaches are emerging for solving these problems. Focus on the topic of OMPs bioinformatics, this dissertation refers to studies on protein sequence encoding, developing classification algorithms and designing new models, for improving accuracy of discriminating OMPs and for solving other relevant problems. The main contents and contributions of the dissertation are summarized as follows:(1) The research on new approaches for discrimination of OMPs from other protein folding types, and for OMPs mining in genomes.There are two main application fields of OMPs discrimination methods:the first is mining of new OMPs and corresponding genes in genomes; the second is accumulating new data for predicting secondary and tertiary structures of OMPs. Two new approaches have been developed for discrimination of OMPs in this research. One of them is a prediction method based on the theory of measures of diversity in biomathematics. In this method, the increment of diversity is used for measuring differences between OMPs and other proteins. This method is easy for implement and to extend for multiclass protein classification. Another of them is developed on the basis of combined sequence features and support vector machine algorithms (SVM). In this method, a protein sequence is encoded by a combined feature encoding scheme, which combines weighted amino acid index correlation coefficient with amino acid composition and dipeptide composition. This method performs better than existing methods in literature for discrimination of OMPs, which provides an effective tool for new OMPs mining in genomes. Furthermore, feature selection techniques are studied for improvement of the combined feature encoding scheme. A filter method has been presented to select the most effectual features in combined features, which is helpful for accelerating the classification process, and even for improvement of prediction performance.(2) The research on algorithms for multiclass protein classification problemsSVMs often perform better than other machine learning techniques in binary classifications. But some problems are keeping unsolved for multiclass SVMs, such as blind regions and errors cumulation. Therefore, several fuzzy SVM algorithms have been introduced to improve multicass SVMs in literature. This reaserch presents a bidirectional fuzzy SVM algorithm, which treats each sample not only as a positive sample but a negative sample. In this algorithm, a sample contributes double errors from being positive and being negative. Further, the fuzzy membership is defined by not only the relation between a sample and its cluster center, but also those among samples, which is described by the fuzzy connectedness among samples. The bidirectional fuzzy SVM algorithm is implemented by "one-vs-one" frames or Directed Graph frames. In tests of membrane protein classification, it is not sensitive to outliers or noises, and outperforms traditional "one-vs-rest" and "one-vs-one" multicalss SVMs.(3) The research on methods for combined prediction of signal peptides and topologies of OMPsThe topology prediction of transmembrane proteins contributes to two aspects: firstly, it offers a frame from secondary structures of OMPs to investigate their tertiary structures; secondly, it is helpful for revising the structural prediction of OMPs. However, existing topology predictors can not predict signal peptide of OMPs precursors. At the same time, the accuracy of them will decline because of the influence of signal peptide sequences. A predictor based on hidden Markov models is developed for combined prediction of signal peptides and topologies of OMPs in this research. In the model, the signal peptide is treated as a part of the whole topology of an OMP precursor, and the architecture is optimized to fit the natural structure of OMPs. This model performs better than other models for topology prediction, and further can be applied for signal peptide prediction and discrimination of OMPs in genomes.(4) The research on methods for transmembrane protein subcellular localization predictionExisting methods for protein subcellular localization prediction are mainly designed for soluble proteins, and usually are not accurate for transmembrane proteins. On the other hand, all topology predictors are designed for transmembrane proteins but are not available for subcellular localization prediction. This research described a new approach to predict subcellular localization of transmembrane proteins, which is an alteration of existing topology predictors, and can give better accuracy than existing methods. It is the only approach for transmembrane proteins subcellular localization prediction, and is also a new application of topology predictors. (5) The research on methods for recognizing small non-coding RNAs in OMPs regulationPrediction of small non-coding RNAs (sRNAs) for regulation is a difficult problem with grand biological value. There is not an approach has been presented for prediction of sRNAs which regulate a given protein type. This research describes a method for prediction of bacterial sRNAs. In this method, a principal component analysis (PCA) process is performed to reduce dimensions and eliminate the correlation of sRNA sequence features, and a BP neural network (NN) is constructed for classification. This PCA-NN classifier can effectively predict bacterial sRNAs, and thus is adopted in a two-phase filtering system for prediction of sRNA regulators of OMPs. The two-phase system searches non-coding regions for sRNA candidates by a base pair scoring between OMP mRNAs and genomic non-coding regions in the first phase, and then filters redundant candidates using the PCA-NN classifier in the second phase. The prediction system can provide less redundant objects for experiments than general methods.
Keywords/Search Tags:Proteomics, Bioinformatics, Outer membrane protein, Machine learning, Measure of diversity, Support vector machine, Hidden Markov model, Small non-coding RNA
PDF Full Text Request
Related items