Font Size: a A A

(?,?)-Pattern Matching Under One-Off Condition

Posted on:2021-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:L YuFull Text:PDF
GTID:2518306560953389Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Pattern matching is a basic operation for strings in the computer field.The purpose is to find all substrings that are the same or similar to the given pattern in a long sequence.Pattern matching has wide applications in many fields,such as search engines,financial analysis,and data mining.Pattern matching with gap constraints is more flexible than traditional pattern matching.It allows the number of wildcards in a specified range in the pattern,which has important research significance.In pattern matching under one-off condition,the pattern contains gap constraints and the same characters in the sequence can only be used at most once.This method has many important applications,such as bioinformatics and sequence pattern mining.Compared with the exact pattern matching,approximate pattern matching is not only more general but also allows some noise.However,most of the current researches are employed Hamming distance.This kind of matching only focuses on the overall difference,and ignores the degree of approximation between local characters,resulting in a large error in the result.To solve this problem,this paper addresses the approximate pattern matching with the local distance no greater than ? and the global distance no greater than ?,which is named(Delta and gamma Pattern matching under One-off condition,DPO).This paper first gives the definition of DPO and shows the complexity of DPO is NP-Hard.Then we propose a heuristic algorithm named NetDPO(approximate Nettree for DPO),which transforms the problem into a ?-Nettree.The algorithm calculates the number of paths that each node can reach to the root within the ? distance,and selects the parent with the smaller related positions,thereby reducing the correlation with the ?-Nettree,and obtaining a locally optimal solution.And combined with the right-most parent strategy,the occurrence with a larger number of remaining occurrences is selected to obtain a globally optimal solution.Iterate this process and find the maximum solution for DPO.Finally,the space complexity and time complexity of the NetDPO algorithm are analyzed,and extensive experiments verify the superiority of the NetDPO algorithm in solving quality and approximate matching effect.
Keywords/Search Tags:pattern matching, gaps constraint, one-off condition, nettree, (?,?) distance, heuristic algorithm
PDF Full Text Request
Related items