Font Size: a A A

The Research For Fast Exact String Matching Algorithms

Posted on:2012-10-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:H B FanFull Text:PDF
GTID:1228330377959395Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The String Matching problem means seeking all the occurrences of one or a set of givensymbol serial (named pattern) in another given symbol serial (named text). This problem is afundamental problem in computer science that has been widely used in most fields of text orsymbol processing, such as network security, information retrieval, and computational biology.To date, with fast developing of these application fields, existing string matching algorithmscouldn’t meet the performance demand of the application. The research fields of this paper arethe exact single pattern string matching and the exact multi-pattern string matching fields.And in this paper, some faster string matching algorithms are presented by improving existalgorithms.In the exact single pattern string matching field, we improved Q-Hash, EBOM andTVSBS and10(serial) algorithms were presented, which are SQ-Hash, TQ-Hash, SEBOM,BOMq, BOMq’, SufOMq, Suf_SEBOM, TVSBSq, TVSBSqA and FQ-Hash. It is shown inthe experimental results that algorithms presented in this paper are the fastest knownalgorithm under64.3%of cases in exact single pattern string matching field.We proved that the average time complexity of Q-Hash, SQ-Hash and TQ-Hash can reachthe lower bound of average time complexity of single pattern string matching. And on thisbasis, we proved that Wu-Manber, the most widely used multi-pattern string matchingalgorithm, can reach the lower bound of average time complexity of multi-pattern stringmatching when it chooses the same hash function with SQ-Hash.We presented a common method to research suffix jump type algorithms, named LegendMethod. This Method can gain the maximum jump distance of suffix jump. It is intuitivelyand can be directly translated into the preprocessing of algorithm. It facilitates the furtherresearches of suffix type algorithms.And we presented an isomorphism automaton of AAC which is named Set DFA. Thebuild time complexity of Set DFA is same with the build time complexity of AAC. Set DFAdoes not use the failure function. It has simpler code, less operations than AAC. And Set DFAis easy to artificial realization. The building time of Set DFA is just a haft of the buildingtimes of AAC.
Keywords/Search Tags:string matching, design of algorithms, exact single pattern, exact multi-pattern, time complexity analysis
PDF Full Text Request
Related items