Research And Application Of DNA Clustering Algorithm Based On Intelligent Algorithm

Posted on:2011-06-20

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhang

Full Text:PDF

GTID:2120360308465015

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

With the continuous development of modern biological technology, especially the implement of the Human Genome Project, people have gradually acquired quantities of gene sequences data. Faced with such a large number of genetic sequence data, only a small part of them we have already known their functions, but most of the gene function is unknown. The clustering technology of Data mining is the technology capable of analysising a large number of gene data. Therefore, by clustering technology, these gene sequences are clustered, and we get some classes. because the gene sequences from one class have similar functions, So that, we can speculate the functions of unknown gene sequences using the known ones. The current research in the field of bioinformatics, clustering analysis has been widely used. The key question of clustering of biological sequences is how to characterize the similarity between sequences. The linear arrangement of the biological sequence data itself is sometimes difficult to reflect the degree of similarity, so in some cases, some similarity measure failure. Thus, affecting the quality of clustering results. Therefore, if the similarity measure designed starting entirely from the sequence itself, it will not get the real clustering results up to the biological observations, It brings some difficulties to the evolution study of DNA sequences. With the deeply research of the graphical expression of DNA sequences, Randic first proposed the use of graphical expression of DNA sequences to study the clustering of gene sequences. By this idea, We can cluster the sequences by the mathematical characteristics collected by the the graphical expression of DNA sequences. referring to existing two-dimensional graphical representation based on base Symmetry, I made some improvement and give a new graphical representation method of DNA sequences. The improved method can make a more space-saving, and this method can also reflect some of the biological features of DNA sequences more clearly. So according to mapping rules, each DNA sequence is translated into three two-dimensional curves, and then extract featural matrixs from the curves, and then cluster the DNA sequences using the matrix invariant, so that, a DNA sequence is transformed into a multi-dimensional data, and the clustering of DNA sequences is transformed into the clustering of multi-dimensional data .The existing common clustering algorithms of multi-dimensional data usually require giving the number of clusters k in advance. However, in most cases, the number of clusters k can not be determined in advance, so the best number of clusters k needs to be optimized. In this paper, I use the clustering algorithm based on particle swarm optimization. In order to solve that the clustering algorithm based on PSO can not determine the number of clusters k, by the k-means algorithm, achieve the best number of cluster k and the structuring of the cluster validity function. The testing has proved the effectiveness of cluster detection function to determine the best number of clusters, and because the introduction of the weights of classes, so that the detection function can be better applied to real data analysis.

Keywords/Search Tags:

DNA sequence, graphical representation, Particle Swarm Algorithm, Clustering Optimization

PDF Full Text Request

Related items

1	The Graphical Representation Of DNA Sequences And The Application Research Of Clustering Analysis
2	A Study Of Spatial Clustering With Constraints Based Swarm Intelligence
3	Research On Gene Expression Data Clustering Algorithm Based On Particle Swarm Optimization
4	Research On Particle Swarm Optimization Algorithms And Applications For Some Optimization Problems
5	The Study Of Particle Swarm Optimization And Improvement
6	Research And Application Of Particle Swarm Optimization Algorithm Based On P System
7	Research And Applications Of Particle Swarm Optimization
8	Optimization Of Polypeptides Separation Condition Based On Improved Particle Swarm Optimization Algorithm
9	Evolutionary Algorithm And Its Applcation In Bioinformatics
10	Research About Heat Conduction Inverse Problem Based On Quantum-Behaved Particles Swarm Optimization