Font Size: a A A

Chain-code Technology And Cluster Analysis For Gene Sequence

Posted on:2013-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y X LiuFull Text:PDF
GTID:2248330374475859Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of modern biotechnology and a variety of biological genomeproject implementation and completion, biological data grows in an exponential form,therebybioinformatics has appeared, with the comprehensive utilization of biology, mathematics andcomputer science and technology. At present, there are already quite a lot of knowledgediscovery and data mining methods applied to the field of bioinformatics, and achievedfruitful results.Based on nucleic acid and protein sequence analysis, this paper mainly focuses on DNAsequence searching and sequence clustering problems. We start with global similarity andlocal similarity to solve sequence clustering problems.In improved chain code based DNA sequence search algorithm design andimplementation, we use graph curves without degradation to present DNA sequence.According to the curves, we define the vector filter and area filter to narrow the search space.The global similarity of sequence clustering algorithm based on edit distance wasimproved, by using an overview of vector distance to measure the similarity between clustersand adding a pruning strategy of common substring to reduce the running time.We improve the PrefixSpan algorithm in the local similarity sequence clusteringalgorithm, to get all closed frequent subsequences. In clustering process, we use the samenumber of frequent subsequences owned by clusters to measure the similarity of them, toreduc the degree of data redundancy while also improve the quality of clustering.Finally experimental results show that the improved chain code based DNA sequencesearch algorithm can reach the experimental target. We find that both global and local featuresshould be take into account in sequence clustering. Considering about the large number ofrepeat and the “core” subsequences exist in gene, mining the “core” subsequences to characterthe entire sequences is an approach more inline with the actual situation. While when needsthe accurate alignment result, we should concern about the global characteristics and dealwith each element.
Keywords/Search Tags:Bioinformatics, DNA Sequence Analysis, Sequence Searching, Cluster Analysis
PDF Full Text Request
Related items