Font Size: a A A

The Analysis Of Biological Sequences' Structure Based On Hierarchical Clustering

Posted on:2020-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:D QianFull Text:PDF
GTID:2370330578963930Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Biological sequences are important study objects in computational biology,which mainly conclude DNA sequences and protein sequences.Biological sequences contain the genetic information of organisms.The discovery and research of genetic information are meaningful in the realm of biology,medicine and pharmacy.To gain the genetic information in biological sequences,the structures of sequences need to be analyzed.As a traditional method in structure analysis,hierarchical clustering is used to measure the similarity between different sequences.Furthermore,function and hidden genetic information can be studied.In our research,primary structure of biological sequences is analyzed.Feature vectors are extracted from biological sequences using numerical mapping.Based on feature vectors,the structures of biological sequences are analyzed.And the functions of sequences are predicted through the similarity between different biological sequences.The research usually ends in the discussion of meaning in biology.The research analyses the structures of sequences mainly by Hierarchical clustering,and combines other methods such as DNA segmentation,analysis of variance and grouping before discussion.By setting the p53 family gene in coding region and DNase I hypersensitive sites(DHSs)in non-coding region as study objects,the specific work can be represented as follows:1.The analysis of p53 family genes' evolutionary diversity.Selecting 24 DNA sequences of p53 family as study objects.With the method of chaos game representation,the DNA sequences are mapped to the point series in rectangular plane coordinate system.And then8-dimension weighted feature vectors are constructed to describe the biological sequences.Combining the methods such as DNA segmentation and analysis of variance,hierarchical clustering is applied to analysis 24 feature vectors.The evolutionary diversity in p53 family can be studied through the result of clustering.The study result shows that the differences of three kinds of genes are mainly reflected in the first 2/3 of sequences,as well as fourth and seventh dimension of the vectors.2.The research of three-base periodicity in p53 family.Three-base periodicity is an important property of the protein-coding region of DNA sequences,so 30 coding sequences of p53 family are selected as studied object.The power spectrum of DNA sequences can be obtained based on Voss mapping and discrete Fourier transform.And three-base periodicity can be visually displayed through the image of power spectrum.Properties such as maximum power spectrum,signal to noise ratio,shift skewness and three-base periodicity's intensity are extracted from three-base periodicity.And stepwise clustering is applied to p53 family.The result shows that p53,p63 and p73 have significant difference in maximum power spectrum,signal to noise ratio and three-base periodicity's intensity.And p53 family has overall stability in shift skewness.Besides,the biological evolution law reflected in the sequence can be effectively analyzed according to stepwise clustering.3.Prediction of DHSs.DHSs are regions of chromatin and prediction of DHSs is helpful for investigation into the function of noncoding genomic regions.The method of pseudo trinucleotide composition can extract the information of gene sequence's local and globalsequence order effects.After analyzing the structure of sites,the algorithm of grouping before classification is proposed to predict DHSs.According to the result of structures' analysis,DHSs and non-DHSs have significant difference in the content of dinucleotide CG.And through comparative analysis of prediction result,the proposed algorithm has higher accuracy in DHSs prediction.
Keywords/Search Tags:p53 family, DHSs, feature extraction, clustering, analysis of difference
PDF Full Text Request
Related items