Font Size: a A A

Analysis The Effect Of Non-frameshifting Indel On Protein Interfaces And Prediction Of Their Pathogenicity

Posted on:2022-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:X L ChuFull Text:PDF
GTID:2480306542967369Subject:Biology
Abstract/Summary:PDF Full Text Request
Non-frameshifting insertion and deletion mutations(indel)is a type of mutation that the number of nucleotides inserted or deleted is an integer multiple of three,which can alter the structure and function of related proteins,leading to a variety of diseases.In particular,non-frameshifting indel occurs on protein-protein and protein-DNA interfaces,which affect protein stability and interactions with other proteins or nucleic acids.In view of the pathogenicity of non-frameshifting indel,researchers have developed a variety of methods for screening pathogenic non-frameshifting indel,but there are still some limitations.In this paper,we analyzed the effects of non-frameshifting indel function on protein-protein and protein-DNA interactions,and proposed an integrated prediction method based on the comparative analysis of non-frameshifting indel prediction methods for disease.The detailed work is as follows:1.The study on the functional effects of non-frameshifting indel on protein-protein and protein-DNA interactions interfaces.Based on the non-frameshifting indel at the coding region provided by CADD(Combined annotation dependent depletion),we analyzed the possible functional effects of non-frameshifting indel on protein-protein and protein-DNA interfaces,including relative solvent availability,related-gene enrichment analysis and structural or functional characteristics of residues.The results suggested that non-frameshifting indel tends to occur in the buried residues of proteins.The corresponding genes of non-frameshifting indel that on hot-spot residues were significantly enriched in cancer-related pathways.Non-frameshifting indel on proteinprotein and protein-DNA interfaces would produce significant effects of protein secondary structure and post-translational modification.Furthermore,we found that pathogenic non-frameshifting indel was significantly enriched in the hot-spot residues that on the protein-protein interface,which would affect the stability of protein-protein interaction.2.The study on an integrated prediction method of pathogenic non-frameshifting indel.We compared and analyzed the current prediction tools of pathogenic nonframeshifting indel with the aspects of algorithm construction,feature selection,input data format and software availability.We improved the performance of the diseaserelated non-frameshifting indel prediction tool by an ensemble learning approach.First,we divided the annotated information of CADD into three types of characteristics:epigenetics,conservation and genetics.Then three characteristics were combined with seven different machine learning classifiers,and the classifier with the best performance for each feature was selected according to the results of the five-fold crossvalidation on training set.Finally,the prediction values of three selected classifiers were taken as input features of the second layer model,and an ensemble learning model based on logistic regression was constructed.Compared with other prediction methods of non-frameshifting indel,the performance of our proposed method is higher.And the method in this paper can always return a predicted result by using the annotation information in CADD,which would avoid missing values effectively.The work provides an effective prediction method for studying the relationship between disease and non-frameshifting indel.Our analysis demonstrates the importance of non-frameshifting indel at the protein-protein interaction interface,and the proposed method has superior performance compared with the existing prediction methods.The content of this study can provide a lot of information for predicting the influence of non-frameshifting indel on protein interactions and its association with disease.
Keywords/Search Tags:Non-frameshifting insertions and deletions mutations, Protein-protein interface, Protein-DNA interface, Hot spot, Prediction algorithm
PDF Full Text Request
Related items