Font Size: a A A

Spectral Clustering Method Based On Granularity Space And Its Application

Posted on:2018-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:Q H LiangFull Text:PDF
GTID:2310330512459255Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
This article through the power spectrum to extract the protein characteristics based on the granular space theory. The optimization clustering index was proposed and establish the optimal clustering model to obtain the hierarchy structure of data based on the intra-class difference and inter-class difference, the characteristics of influenza virus, the mitochondria dehydrogenase subunit data of 19 animals are analyzed. Therefore these results provide a complete set of new processing methods for information processing based on large data. The specific works are as follows:In chapter two, based on the power spectrum, a new way for extracting the protein sequences feature by applying the hierarchical clustering and entropy evaluation is researched. Firstly, the numerical expression of amino acid sequences is given by the classical HP model. Then, the characteristic spectrum of protein sequence is obtained by using the Discrete Fourier Transform, and a 12-dimensional feature vector is constructed to represent the protein sequence spectral. Finally, the hierarchical clustering method is used to obtain the structure of protein sequences.Comparison the test result of mitochondria dehydrogenase subunit data of 19 animals and the data of 11?-globin genes respectively.In chapter three, based on the theories in chapter two, use the detailed HP model and power spectrum to extract protein features, structure analysis among influenza virus protein sequences are conducted by hierarchical clustering method. 20 kinds of amino acids are divided into 4 categories, and further the characteristic frequency spectrum of amino acid sequence is extracted by the Discrete Fourier Transform, and characteristic vectors corresponding to protein sequences are given. Finally, the similarity analysis of protein sequences is obtained through the spectrum clustering method.In the fourth chapter, based on the theory of granular space, the spectral clustering analysis method is studied and a model for extracting the optimal hierarchical structure is established. Applying the optimization model and algorithm proposed in this paper to construct the first structure and the second structure of the influenza virus protein system, based on the nearest principle, the model of signature virus selection is established to extract the signature virus protein for the construction of the H1N1 influenza virus evolutionary trees. A classifier was constructed based on the nearest to the center principle to verify the effectiveness of this method. Analysis found that the selection of the tag virus protein can be effective approximation of the entire virus system.
Keywords/Search Tags:granular space, power spectrum, protein sequence, spectrum clustering
PDF Full Text Request
Related items