Font Size: a A A

Non-overlapping Pattern Matching With General Gaps

Posted on:2019-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:C Y JinFull Text:PDF
GTID:2428330623468984Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Pattern matching with gap constraints is the core and foundation of pattern mining technology,it is also the key to solving technical problems in a large number of interdisciplinary subjects and has been widely used in lots of fields,for instance XML data stream mining,network security detection,and biological gene sequence research.Therefore,the problem of pattern matching with gap constraints has very high significance and practical value.The pattern matching with gap constraints is divided into unconditional constraints,one-off condition constraints and non-overlapping constraints based on the special constraints of occurrence.The latest research results show that the sequential pattern mining under the condition of non-overlapping constraints can find more meaningful occurrences than the one-off constraint conditions.However,the existing non-overlapping pattern matching studies with gap constraints are all non-negative gaps.The non-negative gap has constraints strictly on the order of occurrence of each character in the string,which limits the flexibility of matching greatly.So this thesis proposes a non-overlapping accurate pattern matching problem for general gaps,which improves the flexibility of the matching problem and makes the problem more suitable for practical applications.The main contents and relevant work of this thesis are as follows:1.A non-overlapping accurate pattern matching problem with general gaps is proposed,and a strict definition is given;2.Combining the nettree structure with the backtracking strategy,the NOGLB algorithm is proposed,which first converts the pattern matching problem into a nettree.Then the least left child strategy and backtracking strategy are used,it effectively avoids the useless nettree nodes which need to find and prune when the next time to find the most left root-leaf path,which improves the efficiency of algorithm and improves the quality of the solution;3.The spatial complexity of the NOGLB algorithm is O(m*n*W),and the time complexity is O(m*n*W),where m is the length of the pattern string P,n is the length of the sequence string S,and W is the maximum gap of the pattern string P,W=max(a_i-b_i+1);4.Experiments have been done on real biological data,and the experimental results of NOGLB algorithm and other three algorithms NOGRB,NOGLP and NOGRP are compared.The result shows that the NOGLB algorithm can get more occurrences,and verifies that the NOGLB algorithm is feasible and efficient for solving the general gap with non-overlapping pattern matching problem.
Keywords/Search Tags:pattern matching, general gap constraints, non-overlapping, nettree
PDF Full Text Request
Related items