Font Size: a A A

Research On Similarity Of Three-dimensional Protein Structure Based On Local Features

Posted on:2021-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:R C ZhangFull Text:PDF
GTID:2370330611981006Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Protein is an important component of all cells and tissues of a living body,and it is the bearer of life activities.Biologists have proved the unity of protein structure and function,that is,proteins with similar structures are similar in biological function,and the three-dimensional structure of the protein ultimately determines the biological function of the protein.Therefore,studying the similarity of the three-dimensional structure of proteins is of great significance for exploring the biological functions of proteins and understanding the laws of the development of living bodies.The existing related research is mainly based on the traditional calculation method to compare the three-dimensional structure of the protein,but most methods are often ignore some biologically meaningful features,and do not consider the redundancy of the data.This leads to similarity in calculations,which is not accurate enough and time-consuming in calculations,and it is difficult to meet the growing research needs of unknown proteins.Therefore,based on the previous research work,this paper will conduct research from the following two aspects:(1)Adaptive local feature frequency vector(ALFF)is a method proposed in this paper to use the frequency of local features to represent proteins.First,the C_?distance matrix can be calculated from the protein C_?atomic skeleton,and the local characteristics of the protein can be represented by a partial sub-matrix divided by the C_?distance matrix.The size of the sub-matrix m uses OTSU to determine the most appropriate value according to the characteristics of the data set.After dividing all possible sub-matrices,cluster all the sub-matrices by Mean Shift,k clustering center points are the local features of the protein.According to this method,each protein can count the frequency of its sub-matrix in the k clusters to obtain the ALFF of the protein,and then use the cosine similarity to calculatethe similarity of the ALFF of the two proteins to obtain the protein similarity.The final experimental results show that the similarity analysis of protein structure between ALFF and SCOPe databases has high consistency under class,fold,superfamily,and family classification,and the accuracy rate is improved by 0.7%,2.3%,2.8%and 3.5%compared with LFF.In general,ALFF and TM-SCORE maintain relatively good consistency,but in terms of protein evolution relationship,ALFF can calculate the similarity between proteins whose evolution relationships are far apart,which improves the generalization ability of the model.(2)Analysis of protein three-dimensional structure based on convolutional neural network method.Protein Net is a convolutional neural network model proposed to classify the three-dimensional structure of proteins in this paper.It can extract local features from the input protein C_?distance matrix and calculate the similarity of the three-dimensional structure of proteins by local feature vectors.First,the C_?distance matrix representing the three-dimensional skeleton structure of the protein is unified and standardized into a matrix representation of150×150,and then these distance matrices are input into Protein Net for training as an“image”.After the model training is completed,it is extracted from the network model.The previous layer of the output layer obtains a 1024-dimensional feature vector,and uses the cosine similarity formula to calculate the protein similarity according to the feature vector.Finally,the experimental results show that Protein Net can classify proteins of different families very well,and the accuracy of the three-dimensional structure similarity of proteins at the family level is much higher than other methods.
Keywords/Search Tags:protein structural similarity, local feature, frequency vector, distance matrix, clustering, Convolutional Neural Network
PDF Full Text Request
Related items