| Determining the three-dimensional(3D)structures of biological macromolecules such as proteins is of great significance in the fields of structural genomics,drug design,and protein design.Single-particle cryo-electron microscopy(cryo-EM)3D reconstruction is one of the mainstream techniques in the field of structural biology to analyze the3 D structures of proteins or other biological macromolecules.It does not require samples to be crystallized and has been widely used to study the 3D structures of macromolecular complexes that are difficult to be crystallized.A large number of biological macromolecules,especially protein complexes,may have different conformations in different functional states at the same time and thus have different 3D structures,which is called the structural heterogeneity of biological macromolecules.It is significant to analyze and understand the different conformations of biological macromolecular complexes in different functional states and reconstruct the corresponding 3D structures to study the dynamic mechanisms and principles of biological macromolecules.Analyzing the structural heterogeneity of biological macromolecules is currently a technical difficulty and a research hotspot in single-particle cryo-EM 3D reconstruction.Separating the projection images projected from different 3D structures,that is,classifying the heterogeneous cryo-EM projection images into homogeneous subsets,and then reconstructing the corresponding 3D structures from each homogeneous subset separately,is an effective way to solve the heterogeneity problem in single-particle cryo-EM 3D reconstruction.However,because the projection images of many biological macromolecules with different conformations are extremely similar,coupled with the very high-level background noise in the 2D projection images,and their different projection orientations,it is very difficult to correctly classify the heterogeneous projection images.Aiming at the characteristics of high similarity and high-level noise of heterogeneous projection images,the feature representation of heterogeneous projection images is studied from different perspectives,and some efficient heterogeneous projection image classification algorithms are designed in this thesis.The main contributions and innovations of this thesis are as follows.(1)A heterogeneous projection image classification algorithm based on common lines is proposed.The common lines between projection images are important for determining their projection orientations and are the key features of heterogeneous projection images.In this thesis,a heterogeneous projection image classification algorithm based on the similarity and reliability of common lines is proposed,where the reliability of common lines is calculated by the proposed weighted voting algorithm.The algorithm uses the similarity and reliability between common lines to represent the intrinsic differences between projection images,and implements the classification of heterogeneous projection images by the normalized spectral clustering algorithm.In the experiments,the performance of the proposed algorithm is tested on one homogeneous and two heterogeneous cryo-EM projection image datasets.The experimental results show that applying the proposed weighted voting algorithm yields higher accuracy of the 3D reconstruction and applying the reliability of common lines yields higher classification accuracy of projection images,indicating that the proposed common lines-based heterogeneous projection image classification algorithm is effective in heterogeneous cryo-EM 3D reconstruction.(2)A 2D class averaging algorithm based on a fast image alignment algorithm is proposed.2D class averaging is an important noise reduction technique in singleparticle cryo-EM 3D reconstruction,in which image alignment is a fundamental step.In this thesis,a fast image alignment algorithm based on 2D interpolation in the frequency domain is proposed,and a 2D class averaging algorithm is proposed by combining the proposed image alignment algorithm with a spectral clustering algorithm.In the experiments,the performance of the proposed fast image alignment algorithm is tested on three image datasets,and the performance of the proposed 2D class averaging algorithm is tested on two cryo-EM projection image datasets.The experimental results show that the proposed fast image alignment algorithm can accurately estimate the rotation angle and translational shifts between two images and can obtain sub-angle and sub-pixel level accuracies,and the proposed 2D class averaging algorithm can generate high-quality class averages and can obtain high 3D reconstruction accuracy.(3)A two-stage heterogeneous projection image classification algorithm based on common lines and 2D class averaging is proposed.The common lines,pixel intensities,and corresponding class averages of projection images are important features to represent the intrinsic differences between heterogeneous projection images.In this thesis,two novel distance measures between projection images that integrate common lines,pixel intensities,and class averages are constructed,and a two-stage heterogeneous projection image classification algorithm based on these two distance measures is proposed.In the experiments,the classification performance of the proposed twostage heterogeneous projection image classification algorithm is tested on synthetic and real heterogeneous cryo-EM projection image datasets.The experimental results show that the two novel distance measures can be used to improve the classification performance of the spectral clustering algorithm,and the proposed two-stage heterogeneous projection image classification algorithm can obtain higher classification accuracy and reconstruction accuracy,indicating that the algorithm has some practical value and application prospects in heterogeneous cryo-EM 3D reconstruction.(4)A heterogeneous projection image classification algorithm based on autoencoders is proposed.The autoencoder model can effectively extract the key features of the input sample data and has been successfully applied to many unsupervised learning scenarios.In this thesis,a simple autoencoder model based on multilayer perceptrons and a complex autoencoder model based on residual networks are implemented and an unsupervised autoencoder-based heterogeneous projection image classification algorithm is proposed.The uniform manifold approximate and projection dimensionality reduction algorithm is used to reduce the high-dimensional features extracted by the autoencoder to 2D,and then a spectral clustering algorithm is used to realize heterogeneous projection image classification.In the experiments,the classification performance of the proposed algorithm is tested on two synthetic heterogeneous cryo-EM projection image datasets.The experimental results show that the proposed algorithm can effectively extract category features of heterogeneous projection images and can classify them with high accuracy. |