Font Size: a A A

Research On Regular Expression Matching Algorithm

Posted on:2017-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ChenFull Text:PDF
GTID:2348330503482730Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Regular expression matching is to find the starting position and the ending position of the character sequence matching the given regular expression from the text, the operation has important applications in text editing, bioinformatics, pattern recognition, etc. Through analyzing the existing methods, the results find that the existing methods need to build a suffix tree index.And the suffix tree index space is big, establishing process is complex and locating the information of regular expression prefix and suffix needs to traverse the entire suffix tree,which has low efficiency. In order to improve the efficiency, this article studies the matching question of regular expression from the following several aspects:Firstly, putting access strategy which based on the array index, will first query the position of the character sequence which must be in the regular expression, and then the strategy matches the regular expression according to these locations, finally to find out all of the location that can match the regular expression, thereby avoiding traverse the whole suffix tree. Regular expression matching algorithm is proposed based on the above strategies- Match algorithm.Secondly, as for the problem of redundant computation of the Match algorithm,matching in accordance with the least occurrences of the character sequence is put forward.According to the different constraints of the left and right matching as well as the filtering idea of the previous method two filtering strategies are proposed, that is filtering strategy based on left and right matching,and filtering strategy based on negative factors, which further reduces the candidate results and the verification time, so as to improve the execution efficiency of the algorithm.Finally, by comparing the matching time of a same regular expression on the different characteristics of the data set, the Match algorithm is compared with the current best regular expression matching algorithm, the experimental results verify the efficiency of proposed algorithm in this paper.
Keywords/Search Tags:Regular Expressions, The suffix tree, The Match algorithm, Filtering strategy
PDF Full Text Request
Related items