With the explosive growth of biological information, experimental methods of collect-ing and analyzing the related biological information have been far from meeting the needs of the actual research. People have urgently realized that using intelligent data processing techniques to solve the above problem can greatly save time and cost. Protein sequence information is the focus of research in this field. This paper employs machine learning methods to study on protein subcellular localization prediction and protein structural class prediction. The main contributions are described as follows:1. An improved selective Elman neural networks ensemble method is proposed for Gram-negative bacterial protein subcellular localization prediction. Firstly, Elman net-work is used as a base classifier:Secondly, many different algorithms are employed to train the Elman network to consider the diversity of the base ensemble; lastly, GASEN algorithm is used to select appropriate networks for ensemble, to make sure the networks can complement and coordinate each other. Meanwhile, amino acid composition is em-ployed to represent the protein sequence. Experimental results show that our method can achieve better performance in the self-consistency test, the jackknife test and the independent data set test.2. A novel prediction system ELM-PC A is designed for protein subcellular local-ization prediction, which can determine in advance the parameter value that reflects the protein sequence order effects in the traditional pseudo amino acid composition (PseAAC). Firstly, the parameter A is set to be the maximum to contain the more sequence order information. Secondly, principal component analysis (PCA) is employed to extract the essential features. Finally, the Elman network is used as a classifier. Experimental results show that the system performance is better than other existing systems. Meanwhile, PCA and PseAAC are combined into a new protein representation model PPseAAC. Ex-periments for several common machine learning algorithms show that the new model is superior to the original model.3. An improved locally linear embedding (LLE) algorithm is proposed for protein structural class prediction, which can overcome the singular phenomenon via solving the optimal reconstruction weight in traditional LLE algorithm. This improved algorithm is based on the conjugate gradient algorithm, which has convergence property in finite steps and does not involve the inverse matrix. Furthermore, this algorithm is applied in the protein structural class prediction, where the simple k-nn classifier is used and the parameterλof PseAAC is greater than the sequence length L. Experimental results show that the proposed method has better performance in the jackknife test. |