Font Size: a A A

The String Pattern Searching Algorithms Based On Suffix Arrays

Posted on:2011-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:L X ZhangFull Text:PDF
GTID:2178330338977535Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Pattern-matching problem is a fundamental problem in computer science. In theearly years, most of pattern-matching algorithems focus on the exact pattern-matchingalgorithms study, such as: well-known single pattern matching algorithms KMP BM andmulti-pattern matching algorithms CA CW BNDM so. However, in practical applications,it is very important that approximate pattern matching in information filtering andcomputational biology and other fields. For example, in the analysis of DAN fragments,people usually search similar patterns with the target DNAfragment, in order to determinewhether there are genetic relations between them. To search a meaningful pattern stringquickly in the vast amounts of information, we must improve the efficiency of patternmatching algorithms. Therefore, it is necessary to improve pattern-matching algorithmsfurtherly.Here for the problem of searching the high-frequency words (pattern string) in longerEnglish novel, we applied filtering ideas to present approximate string pattern searchingalgorithms Epattern_searcherH and Epattern_searcher.The main work of this paper is as follows:(1) Propose the algorithms of pattern searching, for the idea of the filter design patternsearching algorithms of approximate pattern matching. First, according to certain filterprinciples (limited to certain conditions) to search a text string, filter out those unlikelymatches the text segment; and then further verify that the positions of the rest of thecandidate matches whether it really exists at the success of matching. As filtered out mostof unlikelymatch the text string, the process of matching has greatly accelerated.(2) To achieve the pattern searching algorithms based on suffix arrays datastructure.Suffix tree is a good data structure for the treatment of large-scale string, but therequiems of suffix tree index structure space is too large, thus there is a pattern searchingindex structure——suffix arrays. Because the storage space of suffix arrays required is farless than the suffix tree.
Keywords/Search Tags:pattern searching, string matching, suffix arrays, english high-frequencywords, filtration, edit distance
PDF Full Text Request
Related items