Research On The Clustering Analysis Algorithms In Bioinformatics

Posted on:2006-08-10

Degree:Master

Type:Thesis

Country:China

Candidate:B N Zhang

Full Text:PDF

GTID:2168360152970129

Subject:Computer applications

Abstract/Summary:

PDF Full Text Request

The rapid development of Biology and Iatrology as well as the step-practicality of gene chip make compareing synchronously and researching a good many genes' characteristics possible,the result of which are vast genes.We can gain some information about biology structure and function by analyzing these data.Now analyzing and researching gene data have become very active cross problem of life sciences mathematics and computer science.Clustering is important means to analyse gene data.This paper emphasizes on researching clustering analysis algorthims of gene expressed data and gene sequence data.At present most clustering analysis algorthims of gene expressed data strongly depend on parameters and the cluster number is changeless.Considering this defects, we introduce an idea to adjust cluster number dynamicly;In order to classify partly overlapped dots effectively and to gain best cluster results, we introduce fake F-statistic and propose a dynamic K-means clustering algorithm based on multi-dimension fake F-statistic will be advanced in this paper, which begins with the similar matrix of genes at multidimension expression levels each time aimed at the clustering cycles required by users and select definited number of genes dynamicly for original clustering groups.Then we should refine the groups continuously by use of the off dispatch square sum so that cluster number will change and constringe continually to the best cluster number dynamicly.This algorithm can ensure a lest inner-cluster disperse matrix trace of final clustering results and can partition the points in multi-dimension to different clusters with special numbers and get best cluster number.map BAG clstering algorithm is a classical clustering algorithm for gene sequence data. But this algorithm does not explain the initial value of cutoff and the value of Threshold definitely.In this paper we propose a Clustering Algorithm based on SZDM(Similar Zscores dynamic Matrix) which is based on the Similar Zscores between sequences and use dynamic Matrix to denote the relation between sequences. The paper alse makes sure the initial value of cutoff, the value of Threshold definitely and the method of how to divide/unite classes so that this algorithm has higher Clustering Correctness.After validated and analyzed,the experiment results show that t dynamic K-meansclustering algorithm based on multi-dimension fake F-statistics can adjust cluster number dynamicly and can make sure the best of all clustering number. Clustering Algorithm based on SZDM(Similar Zscores dynamic Matrix) has higher Clustering Correctnesss.In the end,we combine the cipher technology with the structure charcters of DNA sequences and then pilot study the radom DNA sequences cipher techinology,base on which we design and realize a radom DNA sequences encrypt model.

Keywords/Search Tags:

Clustering analysis, Genes expressed data, Genes sequences, fake F-statistic, Sequences comparion, alignment Similarity

PDF Full Text Request

Related items

1	The Research On Gene Sequences Clustering And Classification
2	Information fusion of multiple genomic sensors for clustering and cis-regulatory element identification
3	The Construction And Analyses Of Databases For Hypersensitive Response Of Phytopathogens
4	Study On Computation Method Of Genes Semantic Similarity And Its Application
5	Improving text clustering for functional analysis of genes
6	Computational Identification And Analysis Of Cancer Biomarkers Based On Expression Data
7	Research And Implementation For Similarity Search Algorithm Of Biological Sequences
8	The Study Of Sequences With Low(ODD) Even Correlation
9	Research On The DNA Sequences Analysis Based On Graphical Representations
10	Pathway Functional Network Construction And Analysis Based On Semantic Similarity Of Gene Ontology