Font Size: a A A

Protein Function Analysis Based On Molecular Visualization And Machine Learning

Posted on:2021-12-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:M LiangFull Text:PDF
GTID:1488306473456174Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The understanding of protein function plays a vital role in the development of medical science and agriculture.At present,a large number of proteins with unknown functions have been obtained through large-scale sequencing technology.Using computer technology to understand protein structure,properties and functions is an effective supplement to traditional biological experiment methods.These technologies can be divided into visualization technology which can directly show molecular properties and prediction technology that annotates protein function by computer.Although the research on biomolecular interactive visualization and protein function prediction has achieved certain achievements,it still has sufficient development potential,and its further development can bring broader benefits.At present,the development of these studies still faces some problems and limitations: for example,under current hardware and algorithms,the visualization scale of biological mesoscopic scenes is limited to the cell level;the abstract model expansion problem brought by multi-scale visualization;and the accuracy of protein function prediction algorithms are still difficult to meet the practical application and other issues.This paper will focus on improving visualization performance,optimizing abstraction level effects,and proposing more accurate protein function prediction models.Firstly,a residue related hierarchical clustering algorithm was proposed to solve the problem that the rendering efficiency of large-scale biological scene could not reach the real-time performance due to the large number of molecules.The residue-based hierarchical clustering algorithm ensures that the molecular clustering process contains all the residues,and retains the biological characteristics commonly used in molecular visualization.The volume-based distance measurement method replaces the traditional hierarchical clustering distance metric,compared with the traditional method.The space volume based method is more suitable for the atom of the three-dimensional structure,and effectively improves the clustering effect;the hierarchical clustering tree of the residue is used to quickly construct the protein clustering process,and the performance of the hierarchical clustering is improved;For the transition problem after the abstraction of each level of discrete protein,the design was used to judge the maximum screen space error of the abstraction level,and adaptive judgment was made according to the relationship between the spatial error and the threshold in the drawing process,so as to realize the transition of the abstraction level.Secondly,to solve the problem of model expansion caused by LOD technique in simplifying geometric model,a LOD macromolecule rendering technique based on ellipsoid envelopment was designed.Based on the hierarchical clustering algorithm related to residues,a complete binary tree was constructed to improve the smoothness of transition of abstract hierarchy.The ellipsoid envelop was used to replace the traditional spherical envelop to reduce the visual expansion caused by the geometric model of high abstract level.By introducing appropriate post-processing technology,the visual problems caused by expansion are further weakened.Based on the ray projection algorithm of the sphere,a GPU-based ellipsoid rendering method is designed to ensure the rendering performance of the molecular model.Finally,the understanding of molecular functions is not only based on the visualization technology of molecular surface,but also on the prediction of protein functions by statistical means of machine learning.The research is mainly carried out from the two aspects of molecular surface parameterization and relational inference model.The molecular surface with additional atomic physical-chemical properties and geometric properties is mapped into characteristic images by visualization technology.Aiming at the insufficient capability of affine invariance of traditional CNN model,a depth model of capsule-like network was designed to verify the correlation between molecular surface and function.At the same time,the existing protein function prediction method is limited by the fixed size of the CNN model input.The amino acid torsion angle and mutual distance need to be simplified.To deal with the problem of loss of amino acid relationship information,a training network based on relational network is designed.Relational reasoning is performed using non-fixed-length residue sequences to improve the accuracy of functional prediction.The above hierarchical molecular rendering and protein function prediction algorithm based on machine learning have been verified in relevant experiments,which can help users to understand the structure and function of proteins interactively,quickly and accurately,and play an effective auxiliary role.
Keywords/Search Tags:computer graphics, molecular surface visualization, deep learning, enzyme function prediction, residue sequence
PDF Full Text Request
Related items