Research On Protein Subcellular Localization Prediction Based On Machine Learning Methods

Posted on:2012-12-18

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J W Ma

Full Text:PDF

GTID:1220330368985893

Subject:Control theory and control engineering

Abstract/Summary:

PDF Full Text Request

With the explosive growth of biological information, experimental methods of collect-ing and analyzing the related biological information have been far from meeting the needs of the actual research. People have urgently realized that using intelligent data processing techniques to solve the above problem can greatly save time and cost. Protein sequence information is the focus of research in this field. This paper employs machine learning methods to study on protein subcellular localization prediction and protein structural class prediction. The main contributions are described as follows:1. An improved selective Elman neural networks ensemble method is proposed for Gram-negative bacterial protein subcellular localization prediction. Firstly, Elman net-work is used as a base classifier:Secondly, many different algorithms are employed to train the Elman network to consider the diversity of the base ensemble; lastly, GASEN algorithm is used to select appropriate networks for ensemble, to make sure the networks can complement and coordinate each other. Meanwhile, amino acid composition is em-ployed to represent the protein sequence. Experimental results show that our method can achieve better performance in the self-consistency test, the jackknife test and the independent data set test.2. A novel prediction system ELM-PC A is designed for protein subcellular local-ization prediction, which can determine in advance the parameter value that reflects the protein sequence order effects in the traditional pseudo amino acid composition (PseAAC). Firstly, the parameter A is set to be the maximum to contain the more sequence order information. Secondly, principal component analysis (PCA) is employed to extract the essential features. Finally, the Elman network is used as a classifier. Experimental results show that the system performance is better than other existing systems. Meanwhile, PCA and PseAAC are combined into a new protein representation model PPseAAC. Ex-periments for several common machine learning algorithms show that the new model is superior to the original model.3. An improved locally linear embedding (LLE) algorithm is proposed for protein structural class prediction, which can overcome the singular phenomenon via solving the optimal reconstruction weight in traditional LLE algorithm. This improved algorithm is based on the conjugate gradient algorithm, which has convergence property in finite steps and does not involve the inverse matrix. Furthermore, this algorithm is applied in the protein structural class prediction, where the simple k-nn classifier is used and the parameterÎ»of PseAAC is greater than the sequence length L. Experimental results show that the proposed method has better performance in the jackknife test.

Keywords/Search Tags:

Protein Subcellular Localization Prediction, Protein Structural Class Pre-diction, Bioinformatics, Machine Learning, Neural Network

PDF Full Text Request

Related items

1	Method Development For Predicting Protein Subcellular Localization Based On Deep Learning
2	Subcellular Localization Bioinformatics Prediction And Verification Of TA Protein In Arabidopsis Thaliana
3	Research On Relevant Problems Of Protein Subcellular Localization Prediction
4	Machine Learning Methods And Their Applications In Bioinformatics
5	Research Of Protein Subcellular Localization Prediction Based Deep Learning
6	Machine Learning Based Protein Subcellular Localization Prediction
7	Research On Methods For Multiplex Protein Subcellular Localization Prediction Based On Machine Learning
8	Deep Learning-based Research And Application Of Protein Subcellular Localization Prediction From Immunohistochemistry Images
9	Research On Protein Subcellular Localization Prediction Based On Transductive Learning
10	The Research Of Subcellular Localization Prediction Based On Discrete Features