Font Size: a A A

Studies On The General Parallel Method To Improve The Performance Of String Matching Algorithm

Posted on:2011-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:H HuangFull Text:PDF
GTID:2178360305467487Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
String matching problem is one of the oldest problems in computer science, which is widely studied too. In recent years, the interest of academic research on string match-ing increased day by day, especially in the rapidly developing field of information re-trieval and computational biology. An important reason for this phenomenon is that the scale of the text needing processed become bigger and bigger, as well as the band-width of network also become bigger and bigger, obectively making the performance requirement of string matching become higher and higher.This paper mainly studied the high-performance exact single-pattern string match-ing algorithm. First, the existing exact single-pattern algorithms are analysed and im-plemented, obtaining the fastest algorithm in various environments through experi-ments, then the gain and loss of each algorithm are analysed based on the concrete data. Based on these work, this paper found a simple parallel method which can speed up many algorithms(use the method of group by gourp instead of one by one in com-paring). This method is used to improve the horspool algorithm and the NEW algo-rithm(the improved algorithms are calld Phorspool algorithm and Phash algorithm), and the experiments proved that the performance of improved algorithms are higher than befor. Also, this method can be applied to many other algorithms to improve their performance.Specifically, the main work done by this paper are listed below:1. The results of previous studies are summarized and deeply analysised using the data of experiments. 2. Three key elements which impact the performance of string matching algorithm are summarized:●The probability and speed of mismatch in search window.●The average length of jumps of search window.●The number of memory references.3. found a simple method of parallel speedup. This method is not limited to speed up some algorithm, but common(not limited to pattern matching algorithm, but all the algorithms that have the process of comparing something one by one). It can fully use the word capacity of CPU, but can not be used in the CPU of non-intel architecture because of the problem of address alignment.4. horspool algorithm is improved by this method, and the experiments proved that the performance of improved algorithm is higher than before. Its worst time com-plexity is improved too.5. designed the Phashp-q series algorithms(where p∈{2,4,8}, q∈[2; 5]) based on the NEW algorithm. This new algorithm solved the problem which is difficult to solve in NEW algorithm, and its performance are totally higher than the NEW algorithm, then become the new fastest algorithm under the small character set environment.
Keywords/Search Tags:string matching, algorithm design, exact single pattern, text search, parallel
PDF Full Text Request
Related items