Font Size: a A A

Nonoverlapping (?,?)-Approximate Pattern Matching

Posted on:2021-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:B J JianFull Text:PDF
GTID:2518306560953549Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Pattern matching with gap constraints can calculate the support of patterns,which is a key issue of sequential pattern mining(or sequence pattern mining),and has significant research value in many fields such as bioinformation retrieval,marketing,time series forecasting.Pattern matching includes exact pattern matching and approximate pattern matching.In actual life,compared with exect pattern matching,approximate pattern matching allows certain data noise,which is more general and significant.Pattern matching with gap constraints is often based on various constraints,in which the nonoverlapping constraints keep lots of useful information while simplifying the calculation,and have stronger expression ability.Nonoverlapping constraints allow any two occurrences can use the same character in a sequence multiple times,but not at the same position.At present,nonoverlapping approximate pattern matching employs Hamming distance,which cannot measure the local approximation between subsequence and pattern,resulting in large deviations in matching results.Hence,this paper employs(?,?)-distance to study an approximate pattern matching,where the local distance and the global distance do not exceed?and?,respectively.The main research contents of this paper are as follows:1.We address Nonoverlapping Delta and gamma approximate Pattern matching(NDP),and the relevant definitions of the problem are given.2.We construct an effective algorithm Net NDP(local approximate Nettree for NDP).In this algorithm,the NDP problem is firstly transformed into a local approximate Nettree,and then the concept of minimal root distance(MRD)is proposed to represent the shortest?-distance from a node to the root level.According to MRD,we can know whether a node has root paths satisfy the global constraint or not and prune invalid nodes and parent-child relationships to avoid invalid access and improve the time efficiency.Finally,we search the rightmost(?,?)-approximation by MRD,and obtain the maximum nonoverlapping(?,?)-approximate occurrence set.3.We analyze the space and time complexities of Net NDP are O(n*m*W)and O(n*m~2*W),respectively,where n,m and W are the length of sequence,the length of pattern,and the maximal gap,respectively.4.By setting the threshold as 0,the correctness of Net NDP is verified on the real protein datasets.By comparing the running time and experimental results,the effectiveness of Net NDP is verified.On real time series datasets,we validate that(?,?)-distance has better matching effect than Hamming distance.
Keywords/Search Tags:patterns matching, sequential pattern mining, nonoverlapping, (?,?)-distance, Nettree
PDF Full Text Request
Related items