Font Size: a A A

The Research On Clustering Algorithm Based On DNA Computing

Posted on:2012-12-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Y ZhangFull Text:PDF
GTID:1118330368977372Subject:Information management and electronic commerce
Abstract/Summary:PDF Full Text Request
DNA computing has been applied in broad fields such as graph theory, finite state problems and combinatorial problem. DNA computing approaches are more suitable used to solve many combinatorial problems because for the vast parallelism and high-density storage. Most clustering algorithms exhibit polynomial or exponential complexity. The problem becomes even far more challenging when the number of clusters is unknown and the data set become huge. The appearance of DNA computing provides an interesting and viable alternative. The clustering is the combinational problem of the patterns. DNA computing is suitable to solve clustering problem considering the patterns as the vertexes in a graph. The researching content is as follow:(1) Using objected-oriented method to analysis the DNA computing.(i) The class programs about DNA molecular and bio-chemical operations are given. The DNA molecular class program contains many different sorts DNA molecular and their relationship. Though analysis the characters of the different DNA molecular, we can know how the DNA molecular can change to another sort DNA molecular.(ii) Though analysis the bio-chemical operations, many sequence programs are given. These sequence programs are about the bio-chemical such as ligation, hybridization, annealing, disnaturing, gel- electrophoresis. We can use these sequence programs to know the whole processing bio-chemical reactions and apply these models to the programming during simulation in silicon computer.(2) The research about clustering algorithms based on DNA computing.(i) We analysis the basic idea of the clustering algorithm and change the clustering problem to the combinational problem or the graph theory problem. The clustering result is a combination of the patterns. This combinational method can ensure that the similarity of the patterns in a same group is very high and the distance between two groups is very long. DNA computing can get all possible combinations of the patterns. Then we can separate the optimal clustering result from these DNA strands. The sticker model is given including its algorithm. This DNA model can use the any clustering problem. We change the grid-based clustering algorithm into a special graph clustering algorithm. This kind of graph has a special structure and it is easy to encoding the DNA strands. For proving the feasible of this algorithm, we change the clustering problem to the Hamilton circle problem. Theoretically, this algorithm is feasible.(ii) In the research of the hierarchical clustering based on DNA computing, we change the hierarchical clustering algorithm to the MST problem and use DNA computing to execute clustering. We gave a basic model and a sticker model for hierarchical clustering. According to the model's algorithm we suppose an encoding method and the design to the bio-chemical reactions.(iii) Research about the grid-based clustering using the DNA computing. We consider the cell into a vertex in a graph. Each vertex in a special graph has eight neighboring vertexes and the axes have relationship between the neighboring vertexes. So we stated four encoding strategies. These encoding methods use the combinations of the DNA segments of the vertexes, the edge and the axes. Each method has its characters. The basic model can be used for any encoding method, but the bio-chemical experiments are different. Because these experiments are designed based on the different DNA strands. The sticker model is built with an example.(iv) DNA computing is applied in the graph clustering. In this thesis, we use DNA computing to solve the image segmentation problem. The clustering algorithm usually is used in the image segmentation problem. We gave a sticker model and algorithm to solve the image segmentation using k-medoids algorithm. The pixel point can be considered to the patterns during the clustering. The grey level can be considered as the attribute of the patterns. So the image segmentation problem just is the clustering problem with two given gray level, such as white and black. The encoding method is based on the k-medoids algorithm. The combinational DNA strand is composed by the vertex DNA segment and medoid DNA segment. The vast density storage and parallels of the DNA computing will be meaningful in front of the million level images.(3) The feasible experiment with the synthetic data. We gave three experiments to prove the feasible of the DNA computing model and encoding strategy.(i) Simulation the real process of the bio-chemical reactions. We use silicon computer to simulate the whole process of the DNA computing. Get all possible combinations of the vertexes and separate the useful result in the test tube. This experiment will cost large time and space because it will generate the entire possible combinational double-stranded DNA molecular. So we gave a small dataset for this experiment. The feasibility of the DNA computing model and algorithm is proved.(ii) Using the parallel algorithm to simulate the DNA computing. The experiment can simulate the parallel reactions of the DNA computing like in the test tube. So the time will be saved. We use the large dataset and complex geometry graph fro this experiment.(iii) Build a mathematic model to simulate the process of the clustering based on DNA computing. Improve the scan method during the ligation reaction and increase the speed of the simulation. This neighboring vertex scan method can induce the generation of the errors and duplicate DNA strands. The complex geometry graph is used in this experiment and we get the good clustering result.(iv) The simulation programming is compared with the CLIQUE algorithm. We find that the calculating time is related with the number of the candiate vertexes. If the patterns are density, the time is small, and if the patterns are loose, the time is large. The result is the same as the old clustering algorithm.(4) Discussing of the complexity of the time during the simulation in silicon computer.(5) A genetic algorithm for generating the DNA strand is given in this thesis. Designing DNA strands is one of the most practical and important research topic in DNA computing. An improved genetic algorithm for designing of the equal-length single-stranded DNA sequences is proposed, especially for the short DNA sequences needed in the DNA computing experiment. This algorithm can satisfy certain combinatorial and thermodynamic constraints such as GC content constraint, Hamming distance constraint and similar constraint. It can generate the good short DNA sequences quickly by giving up the bad sequences in every generation. Finally, through analysis the experiment data, two functions are given to estimate the number of the DNA sequences satisfied GC content constraint and Hamming distance constraint.(6) We use DNA computing to solve three real problems. One is the geographic division of the Shandong province with 17 cities. We design the DNA computing model and algorithm to cluster the 17 cities into three groups. The basic idea used MST algorithm. The other experiment is to use the real data from UCI. And in the third experiment, the DNA computing is applied in the image segmentation of the plate and writing image.DNA computing is a new techniques for the clustering algorithm and the clustering algorithm is a new application field for the DNA computing. With increasing the database, the requirement of the storage and computing speed is higher and higher. The appearance of the DNA computing is meaningful the clustering problems. Although there is no real bio-chemical experiment in the lab, the model and the encoding stagy are both based on the old model and algorithm. So the feasibility and validity are not necessary to worry about. Factually, in the thesis we gave a theory proving and experiment to support the models.
Keywords/Search Tags:DNA computing, Clustering algorithm, Hierarchical clustering, Grid-based clustering, Sticker model
PDF Full Text Request
Related items