Font Size: a A A

Graphical-Model-Based DNA Sequence Clustering With Improved Certainty Estimation

Posted on:2018-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:J SunFull Text:PDF
GTID:2310330512979797Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The graphical representation of biological sequences has become an important method to study biological sequences because of its good visual description and local information expression.The evolutionary association between sequences can be effectively studied by the use of graphical representation of biological sequences and cluster analysis.However,how to construct a more effective graphical representation and more accurate assessment of cluster certainty is still a problem.This paper mainly focuses on the research of graph representation and clustering certainty evaluation methods.The concrete contents are as follows.1 This paper constructs a simplified DNA sequence space curve based on H-curve.For longer DNA sequence representation,this method does not appear the phenomenon which is far away from the center line,and can avoid the overlap and cross problems.It is convenient and intuitive to understand and facilitate the analysis of geometric characteristics.2 On the basis of the simplified spatial curve,the paper describes the characterization of the DNA sequence by using the geometric characteristics of the curve(curvature and torsion estimation).The distance matrix is calculated by the improved distance measure method between sequences,and the clustering analysis is made based on the obtained distance matrix and the phylogenetic tree is constructed to show the clustering results.3 There are two drawbacks of the standard bootstrap method if it is directly applied to the biological sequence clustering.One drawback is that bootstrap ignores the fact that biological evolution is gradual,assuming that each sample is equally likely.Another drawback is that bootstrap ignores the correlation of bases in a DNA sequence,assuming that the bases are independent of each other.On the basis of Bootstrap method,an improved method to evaluate the certainty of DNA sequence clustering is proposed.The method first randomly extracts a certain proportion of nucleotide bases from the original DNA sequence,and then uses the genetic algorithm to replace each of the extracted bases.The method was used to evaluate the certainty of the phylogenetic tree constructed by DNA sequence clustering.The experimental results show that the accuracy of the certainty assessment is improved,indicating that the method is feasible and effective.In this paper,the distance matrix is constructed by using the proposed graphical representation method and the improved distance measure method.The improved method is used to evaluate the clustering results based on the above-mentioned matrix,and the results obtained by other methods are also compared.The proposed method is superior to the comparative method.In the end,the research work is summarized and the future work is prospected.
Keywords/Search Tags:Graphic representation, Cluster analysis, Phylogenetic tree, Bootstrap
PDF Full Text Request
Related items