Font Size: a A A

An Optimization Algorithm For Matching Text With Regular Expressions

Posted on:2013-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y J JiangFull Text:PDF
GTID:2298330467478690Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Regular expression can describe difficult query, and at the same time it can describe the common features among a class of texts by the specific syntax. Since regular expression has the strong ability of expression and has the simple syntax, the regular expression has become more and more popular in computer science and the related field. As a result, it is the most important to match regular expression in a text fast.There are many regular expression engines. But almost all the regular expression matching algorithm is based on the the theroy of automata. In other words, regular expressions are usually converted into a deterministic finite state automata or nondeterministic finite state automata, and then match the text with the automata. At the same time, for the finite state automata is built based on the regualr expression, then all the regular expression matching algorithm is online. According to the matching strategy, the regular expression matching can be divided into two kinds, one kind is global alignment, the other kind is local alignment. The global aligment is to decide the string in the dataset belong to the language of regular expression. While the local alignment is to decide the substring of the text belong to the language of regular expression, and also return the position of the substring.According to matching strategy, the paper designs two algorithms to solve global alignment and local alignment. The two algorithms are both offline, with two different index structures. For the global aligment, the paper gives the definition of the pref_suf, and design the filtering theory based on pref_suf. At the same time, using the trie to build the index of the string collection. At last, in this algorithm, it design an alogrithm to simplify the regular expressions. The process of the algorithm for the global alignment involves three steps. The first step is to build the index with the pref_suf. The second step is to find the pref_suf of the regular expression and use them to filter the collection of strings. The third step is to verify the candidate using the finite state automata. For the local aligment, it design a new way to match regular expression, which is different from the algorithms based on the finite state automata. First, it designs an inverted list to improve BWT index structure. And it also gives the computing rules of the operators in the regular expressions. So the algorithm is divided into two steps, the first segment is to build the BWT index. The second segment is to search the regular expression segement on the index. And then computing the postion list according to the rules above. At last, the paper does a large number fo experiment, and the results show that the two proposed algorithms for matching text with regular expressions is efficiently.
Keywords/Search Tags:regular expression, filter, Trie, BWT, query optimization
PDF Full Text Request
Related items