Font Size: a A A

Researching Cluster Analysis And The Application In Biological Data Analysis

Posted on:2010-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z SuFull Text:PDF
GTID:2178360275481832Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Bioinformatics is an interdisciplinary science. It comprises the obtaining, processing, storage, distribution, analysis and interpretation of bioinformation. Mathematics, computer science and biology are used to illuminate and comprehend the biological meanings of the large amount of data. The emphases here are the applications of cluster analysis in bio-molecular sequence study.Clustering measure is an important tool for feature extraction. Firstly, this thesis outlines the measurement method of the previous cluster and proposes a new clustering measure based on information theory, which are used to analyse distance of the information distribution. New measure satisfies non-negative, symmetry, extreme, and certainly, and so on.Secondly, this thesis uses information theory based method to sequence comparison. In this thesis, a new information theory based methods of sequence comparison is proposed. Compared with traditional methods, this method does not require sequence alignment, there is no interference of subjective factors, the data would not undermine the original state. 20 kinds of viviparous mammals mitochondrial gene sequences are selected in the experiment, which are used information theory based methods to compare the whole genome sequence and the new method to compare gene sequence fragments, and constructed phylogenetic tree by NEIGHBOR. It appears from the results of the comparison, the new method with less time constructs the phylogenetic tree which does not lag behind the previous method, and the new method has good robustness.Finally, this thesis uses fuzzy clustering based on discrepancy of the information distribution to build phylogenetic tree. Based fuzzy relationship of evolution and discrepancy of information distribution, this thesis presents a new method to construct phylogenetic tree. Biological sequences are translated into information sets, new information theory based measures calculate membership degree between sequences, and fuzzy cluster analyses cluster of species at different times, so it may be inferred phylogenetic history of species . The experimental result shows that phylogenetic tree built by this method is credible.
Keywords/Search Tags:Cluster analysis, Information entropy, Fuzzy clustering, phylogeny
PDF Full Text Request
Related items