Font Size: a A A

Bioinformatic Study Of Pathogenicity Mutations On Disordered Proteins

Posted on:2021-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:S YangFull Text:PDF
GTID:2480306503487074Subject:Biology
Abstract/Summary:PDF Full Text Request
With the improvement of sequencing technology,the data of protein mutations have exploded,and evaluating harmfulness of protein mutations and investigating pathogenic mechanism could contribute to identifying potential drug targets and providing strategies for disease diagnosis and therapy.However,pathogenicity prediction and pathogenesis studies are mostly focused on structured proteins based on a molecular biological assumption that the structure determines the function,while few attentions are paid on other proteins without rigid three-dimensional structures—intrinsically disordered proteins.In this work,we proposed a pathogenicity prediction algorithm targeted for disordered protein mutations using machine learning,and developed a precise force field for disordered protein simulation based on CMAP potential,and lastly analyzed the pathogenicity of p53 mutations by combining above methods.This work filled the gap of disordered protein mutations research,which could contribute to the pathogenesis investigation of disordered protein mutations and inspire the theoretical foundation for the development of drugs related to disordered proteins and disease therapy.In the first part,we found that pathogenicity prediction algorithms are mainly based on evolutionary conservation and protein structures,but may exhibit low accuracy and incapability while applied to disordered proteins with low conservation and undetermined three-dimensional structures.Therefore,we used embedding models to extract features from raw protein sequences,and established an end-to-end algorithmic framework SDP.New algorithm SDP that only using protein sequences could speedily and accurately screen the pathogenicity of disordered protein mutations,which exhibit the comparable performance to conservation-only methods.However,SDP is unable to explain detailed relationship between mutations and diseases,though it could be applied into screening out key disease-associated mutations from disordered protein mutations on a large scale.Thus,we attempted to apply dynamic simulation to reveal the potential mechanism of diseases,but the performance of simulation is highly dependent on force field and empirical force fields always overestimate the compactness of disordered proteins that could influence the precision of mechanism investigation.Therefore,in the second part,we incorporated CMAP potential into OPLS-AA/L and developed a novel force field OPLSIDPSFF targeted for disorder proteins,and evaluation on benchmarks manifested that OPLSIDPSFF could sampling more precise ensembles and reproducing more accurate experimental observables.Eventually,we demonstrated the application of pathogenicity prediction SDP and force field OPLSIDPSSFF on disordered protein p53,and simulation results validated the prediction of SDP and revealed the structural mechanism of diseases-associated mutations.
Keywords/Search Tags:Protein Mutation, Disordered protein, Pathogenicity mutation, Molecular force field, Machine learning
PDF Full Text Request
Related items