Research On Regular Expression Matching Algorithm

Posted on:2017-10-30

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Chen

Full Text:PDF

GTID:2348330503482730

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Regular expression matching is to find the starting position and the ending position of the character sequence matching the given regular expression from the text, the operation has important applications in text editing, bioinformatics, pattern recognition, etc. Through analyzing the existing methods, the results find that the existing methods need to build a suffix tree index.And the suffix tree index space is big, establishing process is complex and locating the information of regular expression prefix and suffix needs to traverse the entire suffix tree,which has low efficiency. In order to improve the efficiency, this article studies the matching question of regular expression from the following several aspects:Firstly, putting access strategy which based on the array index, will first query the position of the character sequence which must be in the regular expression, and then the strategy matches the regular expression according to these locations, finally to find out all of the location that can match the regular expression, thereby avoiding traverse the whole suffix tree. Regular expression matching algorithm is proposed based on the above strategies- Match algorithm.Secondly, as for the problem of redundant computation of the Match algorithm,matching in accordance with the least occurrences of the character sequence is put forward.According to the different constraints of the left and right matching as well as the filtering idea of the previous method two filtering strategies are proposed, that is filtering strategy based on left and right matching,and filtering strategy based on negative factors, which further reduces the candidate results and the verification time, so as to improve the execution efficiency of the algorithm.Finally, by comparing the matching time of a same regular expression on the different characteristics of the data set, the Match algorithm is compared with the current best regular expression matching algorithm, the experimental results verify the efficiency of proposed algorithm in this paper.

Keywords/Search Tags:

Regular Expressions, The suffix tree, The Match algorithm, Filtering strategy

PDF Full Text Request

Related items

1	Finding MUMs With Enhanced Suffix Arrays
2	The Properties And Regular Expressions Of Two Types Of Fuzzy Finite Tree Automata
3	Research On Regular Expression Matching Of Network Data Flow
4	Algorithms To Speeding Up Multiple Regular Expressions Matching For Application Traffic Classification
5	The Design And Implementation Of Regular Expression Engines Based On Deterministic Finite Automata
6	Research Of Compression Algorithm For Deep Packet Based On Regular Expressions
7	Research On Construction Of Index Structure For Biological Sequences
8	Study On Automaton-Based Regular Expression Matching Algorithms
9	An Algorithm Based On Suffix Tree For Identification Of Repeats In DNA Sequence
10	An Improved Algorithm Of The Regular Expression Based On DPI System