| The molecular cavity existing in protein has an extremely important influence on the function of protein,so the study of molecular cavity is also an important component of the study of protein structure and function research.The change of molecular cavity in protein will affect the structure of protein,and the function of protein is closely related to the structure of protein.Therefore,it is of great significance to study the change mode of molecular cavity and the factors affecting its change.This paper explores the representation of molecular cavities and the comparison method of similarity between molecular cavities according to the characteristics of molecular cavity data with complex changes and large amount of data.Based on this,the change pattern of the molecular cavity is analyzed.The main research contents are as follows:First,pre-process the molecular cavity data to extract the molecular cavity sequence from the original molecular cavity data.The molecular cavity data stored in the protein molecular data is converted into molecular cavity sequence data that can be easily processed by a computer.This notation can retain the topology information of the molecular cavity,and can facilitate the comparative analysis and other processing.Secondly,on the basis of the serial representation method of molecular cavity,a similarity calculation algorithm of molecular cavity sequence based on dynamic time warping algorithm,namely DTW-CS algorithm,is proposed.Thirdly,referring to the calculation method of text similarity in natural language processing,a vectorized representation method of molecular cavity based on word2 vec model is proposed.On the basis of the vectorized representation of the trained molecular cavity,we use machine learning related algorithms to conduct in-depth research and analysis on the changes of the molecular cavity.Finally,experiment verification on the real data set.The experimental results show that the method proposed in this paper is helpful for the analysis of the molecular cavity change pattern. |