Font Size: a A A

The Information Expression And Analysis Of P53 Family Genes Based On Feature Extraction Method

Posted on:2018-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:R CaiFull Text:PDF
GTID:2310330512459243Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
The p53 gene is the most closely related to the tumor so far and its family members p63 and p73 have high homology with it in structure and function. Therefore, it is important to use effective mathematical methods to mine the biological information of the p53 family genes,which will be of great significance for the prevention and control of the tumor. This paper takes the complete CDS of p53 family genes as the research object, using the feature extraction method to identify the expression information of the sequences and analyze the sequences of p53 family genes using hierarchical clustering. The specific work of the article is summarized as follows:Gene signature is a new method of gene expression information recognition based on feature extraction, which can effectively identify some biological information of gene. On the basis of original gene signature method with biological characteristics, the introduction of the average energy of delocalized electrons(EIIP) of the nucleotide with certain physical characteristics, established a new E-gene signature method. At the same time, an E-Euclidean distance and an e-average variance formula of two sequences are defined and the related species were analyzed by hierarchical cluster analysis. The results were obtained by conducting E-gene signature on the complete CDS of p53 family genes mRNA of 16 species,both in each of the genes, the relationship between species closer, the similarity of gene signature higher.In addition to the gene signature based on the principle of CGR graph structure, a12-dimensional feature vector is constructed by using CGR method to characterize the sequence of mRNA in this paper, and the Euclidean distance is defined as the distance between sequences. Based on the Euclidean distance, the p53 family gene sequences of 16 species were analyzed by hierarchical clustering analysis, and compared with the clustering result by using the 8-dimensional feature vector. The results indicate that it is more reasonable by using 12-dimensional feature vector to characterize gene sequences.In order to avoid the loss of sequence information, a multi-index species similarity analysis method were established through the four characteristic indexes of CGR migration,average power spectrum, EIIP and base real number comprehensively. And the mRNA sequences of the p53 family genes were analyzed by hierarchical cluster analysis which making clustering spectral map. The results of cluster analysis are consistent with the actual situation, which shows that the four-fold feature index method can be used to characterize the sequence of genes, which can fully reflect the biological information of the sequences.
Keywords/Search Tags:p53 family gene, CGR, gene signature, feature vector, hierarchical clustering analysis
PDF Full Text Request
Related items