The Graphical Representation Of DNA Sequences And The Application Research Of Clustering Analysis

Posted on:2008-03-26

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Zhou

Full Text:PDF

GTID:2120360242965291

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The rapid development of Biology and the research on protein sequences, more and more molecular sequences data have been generated. We can gain some information about biology structure and function by analyzing these data. Bioinformatics is mainly deal with complex computations involving gene sequences, protein sequences by mathematics and computer science. The technology of data mining, especially the clustering is an important means to analyze gene sequences. This paper emphasizes on researching gene sequence graphical representation and the application of clustering technology based on the graphical representation.In this paper, a novel 3-D graphical representation with no-degeneration is presented. The new 3-D graphical has the virtue of avoiding the overlap or cross without losing biological information and containing the mainly biological characteristics of the originality sequence. In order to construct the sequence matrix, the geometrical center is introduced. The gene sequence is declared by the max eigenvalue of gene sequence matrix.The clustering technology analyzing on the gene sequence graphical representation data is the primary content. in this paper, We introduce fake F-statistic and propose a dynamic Fuzzy K-means clustering analysis technology, this clustering technology can ensure a lest inner-cluster disperse matrix trace of final clustering result and partition the points in multi-dimension to different clusters with special numbers and get best cluster number. We construct the gene graphical representation data of H5N1 gene sequences to test means of the clustering analysis, the result shows that is rational to make clustering analysis on the gene character abstract from the gene graphical representation.BIRCH clustering algorithm is a new algorithm for large datasets, but this algorithm has some defects. Considering these defects, we improve on the threshold in the CF-tree based on sum of deviation square to meliorate the pertinence between the clusters. The split factor is defined by the max diameter to overcome defect of the factor from the experience. At last, we bring the improved BIRCH clustering algorithm to analyze the gene graphical representation data elementary.

Keywords/Search Tags:

gene sequence, graphical representation, Pseudo F-Statistics, Fuzzy clustering, BIRCH algorithm

PDF Full Text Request

Related items

1	Research And Application Of DNA Clustering Algorithm Based On Intelligent Algorithm
2	Protein Sequence Comparison And DNA-binding Protein Identification With Generalized PseAAC And Graphical Representation
3	Graphical Representations Of Nucleic Acid Sequences And Its Application
4	The Research On Graphical Representation Of Protein Sequence And Application
5	The Research On The Graphical Representation Of DNA Sequences
6	Research On Graphical Representation And Its Application In Bioinformatics
7	Research On Gene Expression Data Analysis Method And Its Application
8	Spectrum-like Graphical Representation Of Biological Sequence And Its Applications
9	A Hybrid Fuzzy Clustering Algorithm And Its Application
10	Mathematical Description Of The Biological Macromolecules And Its Applications