Chain-code Technology And Cluster Analysis For Gene Sequence

Posted on:2013-02-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Liu

Full Text:PDF

GTID:2248330374475859

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of modern biotechnology and a variety of biological genomeproject implementation and completion, biological data grows in an exponential form，therebybioinformatics has appeared, with the comprehensive utilization of biology, mathematics andcomputer science and technology. At present, there are already quite a lot of knowledgediscovery and data mining methods applied to the field of bioinformatics, and achievedfruitful results.Based on nucleic acid and protein sequence analysis, this paper mainly focuses on DNAsequence searching and sequence clustering problems. We start with global similarity andlocal similarity to solve sequence clustering problems.In improved chain code based DNA sequence search algorithm design andimplementation, we use graph curves without degradation to present DNA sequence.According to the curves, we define the vector filter and area filter to narrow the search space.The global similarity of sequence clustering algorithm based on edit distance wasimproved, by using an overview of vector distance to measure the similarity between clustersand adding a pruning strategy of common substring to reduce the running time.We improve the PrefixSpan algorithm in the local similarity sequence clusteringalgorithm, to get all closed frequent subsequences. In clustering process, we use the samenumber of frequent subsequences owned by clusters to measure the similarity of them, toreduc the degree of data redundancy while also improve the quality of clustering.Finally experimental results show that the improved chain code based DNA sequencesearch algorithm can reach the experimental target. We find that both global and local featuresshould be take into account in sequence clustering. Considering about the large number ofrepeat and the “core” subsequences exist in gene, mining the “core” subsequences to characterthe entire sequences is an approach more inline with the actual situation. While when needsthe accurate alignment result, we should concern about the global characteristics and dealwith each element.

Keywords/Search Tags:

Bioinformatics, DNA Sequence Analysis, Sequence Searching, Cluster Analysis

PDF Full Text Request

Related items

1	The System Of Display And Analysis Of Gene Sequence
2	Research On The Design And Analysis Of Several Classes Of Pseudorandom Sequence And Sequence Families
3	Applications Of Machine Learning Approaches To Biological Sequence Analysis
4	Research On Sequence Alignment Algorithms In Bioinformatics
5	Research On Key Technologies Of Accelerator For Biological Sequence Analysis
6	Research On Parallel Processing Technology Of Sequence Analysis
7	Application On DNA Sequence Analysis Using Ant Colony Algorithm
8	Sequence-specific sequence comparison using pairwise statistical significance
9	Based On Data Mining, Biological Sequence Analysis
10	Research On Multiple Sequence Alignment Algorithms In Bioinformatics