Font Size: a A A

Research Of High-efficiency Motif Finding Algorithms

Posted on:2012-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2178330335451221Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
ABSTRACT:Finding motif with mutations in DNA sequences is an important problem in molecular biology and computational biology. Word-based motif finding algorithms typically use suffix trees to represent DNA sequences. When matching a pattern on a tree in the depth first manner, different measures are taken to prune the pattern space, such as tracking mismatched counts or using some probabilistic model to discard paths so as to speed up the process. While the time complexities are still too high. Then some algorithms turn to use other complicated space consuming structures to speed up the process,but the cost of time and space still can't satisfy our need. To solve the problem, the thesis develops an elegant structure and two algorithm to utilize previous matching results for upcoming pattern, which dramatically speeds up the process with exact results and low space cost.The thesis still uses suffix trees to store DNA sequences. However, we present a structure of a matched path set stack for recording previous matching results of common prefixes of different patterns, which is called SUTMAPST structure. A stack is bound with a suffix tree by caching head nodes of tree node links, each of which is used to represent the matched path set of a pattern which may be a common prefix of other patterns that may be processed later. Having a matched path set stack, a pattern matching routine can descend along a suffix tree in depth first manner but can backtrack just hierarchically which avoids a large number of re-matching operations, and it can effectively support suffix tree based motif finding algorithms.The thesis develops an algorithm SUTMAPSTA that can solve Planted d-Motif Problem to verify the performance of the SUTMAPST structure. In addition, the thesis uses SUTMAPST structure and the pruning method of the Weeder algorithm to present a WSUTMAPSTA algorithm, which has the same veracity as the Weeder algorithm, but cost less time than Weeder. The experimental results and analysis prove that SUTMAPSTA and WSUTMAPSTA which use the SUTMAPST structure run much faster than current algorithms with very low space cost.
Keywords/Search Tags:bioinformatics, sequence analysis, suffix tree, motif finding
PDF Full Text Request
Related items