Font Size: a A A

Research On Approximate Pattern Matching With Flexible Wildcards And Length Constraints

Posted on:2014-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:G L HuangFull Text:PDF
GTID:2268330401989191Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid increase of data in bio-informatics, network intrusion detectionand text retrieval, how to extract information, in which users are interested, hasbecome an important research topic. Pattern matching and mining play importantroles on this problem and raise widely concerns of scholars.In order to meet the flexibility of users’ queries, wildcards and lengthconstraints have been introduced into the pattern matching problem. Thisdissertation focuses on the APMWL Problem (APMWL, Approximate PatternMining with flexible Wildcards and Length constraints). Users can specify themistake threshold, the global minimum length, the global maximum length, and therange of wildcards between each sub-pattern character. Research on this problemalso attracts much attention in the field of approximate pattern mining.Subsequently, we extend this problem to the MAPWO Problem (MAPWO, MiningApproximate Pattern with Wildcards and One-off condition) and obtain a bettersolution.There are mainly three contributions in this dissertation:(1) For the APMWL problem, this dissertation proposes the APM algorithmand the APM-OF algorithm respectively to solve the problem with One-offcondition or not. Experiments verify that both APM and APM-OF have significantadvantages on matching solutions against other peers. Meanwhile, the influences ofpattern length, global minimum and maximum length, mistakes threshold are alsoconsidered in this dissertation.(2) For the MAPWO problem, the MAP algorithm is proposed by extendingthe APM-OF algorithm into approximate pattern mining. Comparing with theOneoffMining algorithm, experiments show that the number of frequent patternsmined by MAP is2.07times that of OneoffMining. Besides, we also make anexperimental analysis on the main factors of MAP.(3) Finally, the prototype systems are designed for the APMWL and MAPWOproblems, which can provide a platform for further research.
Keywords/Search Tags:wildcard, length constraints, One-off condition, approximate patternmatching, approximate pattern mining
PDF Full Text Request
Related items