Font Size: a A A

Application Of Machine Learning Algorithm In Protein Structure Prediction

Posted on:2017-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y N XueFull Text:PDF
GTID:2180330488982518Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As a classic application of computer science and technology, Bioinformatics that try to solve the biological issues has been widely developed with the implement of the human genome project and the development of biological science. Generally, the biological information and biological genetic information is firstly collected, stored and analysesed by computer, and then the obtained data can be utilized to improve the quality of drug development. Following the study of genomics and transcriptome, proteomics has become a famous research subject of bioinformatics system.With the development of protein sequencing technologies and X-ray crystal diffraction technology,we can easily collect large amount of protein sequence and structure data. Together with the protein function analysis method, we can take full use of machine learning method to predict the protein structure and function by learning rule of known protein sequence and structure data. In this paper, we use the effective deep learning method in machine learning to deeply analysis some important problems in proteomics including protein interaction prediction and protein secondary structure prediction. This paper contains the following sections:1) we present an improved deep boltzmann machine model(DBM) to predict protein interactions. In order to avoid saturation by using sigmoid or tanh activation function in depth network, ReLU modified restricted boltzmann machine(RBM) is selected to improve the sparsity of the network,to avoid over-fitting, and to improve the convergence rate. The network structure using the DBM model of two layers of RBM, at the same time, using a novel Multi-scale Continuous and Discontinuous feature representation and autocovariance approach to encoding protein sequence, After the experiment proved that the prediction model can more accurately than the other method to predict protein interactions.2) Aiming at the problem that using manual-designed feature will struggle with the inaccurate and high cost in protein secondary structure prediction, we propose a new prediction method based on Convolutional Neural Network(CNN). Firstly, a two dimensional feature matrix can be obtained using the 20 kinds of amino acids in protein to quantify the original sequence of the protein. Then, one-dimension CNN is introduced to extract new feature from the obtained two dimensional feature matrix. Our CNN model consists of five convolution layers and three fully-connected layers. In particular, to reduce the over fitting problem, the Dropout method is used in the full-connection layer. Theoretical analysis and experimental results proved the effective of proposed method.3) To solve the problem that the non-sequential situation in the feature extraction step of CNN, the Bidirectional Long-Short Term Memory(BLSTM) structure of Recurrent neural network is utilized to predict the protein secondary structure. Specifically, the prediction model contains a BLSTM layer, two fully-connected layers and a soft-max classification layer. In the beginning, the forward and backward recursion of the hidden layer in BLSTM is conducive to get the contextual feature information of the protein structure. It is worth to note that the special memory unit of BLSTM is able to collect the inter relationship of long distance amino acids, which improves the quality of extracted temporal feature. The effectiveness and superior of the proposed method is demonstrated by the experimental results.
Keywords/Search Tags:deep learning, Deep boltzmann machine, Convolution neural network, Protein secondary structure, Protein interaction
PDF Full Text Request
Related items