Research On Matching Method Of Uncertain String

Posted on:2019-03-09

Degree:Master

Type:Thesis

Country:China

Candidate:D Wang

Full Text:PDF

GTID:2428330566988982

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With a given text T and a pattern P,string matching refers to the procedure designed to find all the substrings identical to P in T.The application of string matching involves various subjects including bioinformatics,text editing,pattern recognition,natural language processing,search engine,etc.The rapid growth of Internet technologies has resulted in diversified sources of statistics,which leads to the generation of uncertain data.This paper focuses on the problem of uncertain string matching in the field of bioinformatics,and the research content is as follows.Firstly,a deep analysis is adopted on the current methodologies,and problems like long-time and large-scaled construction of index are discovered.Secondly,in view of the fact that there are few character types in bioinformatics,a index named USAL with space cost of character type multiplied by string length is proposed in this paper.Concepts of positive and negative character are proposed in USAL.Based on the positive character,an effective string matching algorithm,namely GUSM,is proposed combined with the Greedy algorithm and Range Maximum Query.The probability of each character occurrence in each position is recorded by USAL,therefore,when matching strings,the position of the positive character with the largest probability will be returned in the form of time O(1)by RMQ according to the given probability threshold before being verified.At last,results that meet the threshold will be returned.In order to improve the filtering effect of positive character position,filtering strategies based on minimum probability character and randomly-selected character are proposed.Thirdly,to enhance the process of matching,a small-scaled index named BI is adopted on the foundation of bitmap thought.The application of BI allows the probability of positons in USAL to be bitmapped,which promoted another highly-efficient string matching algorithm named BUSM.When matching strings,BUSM will turn the matching operation into bit manipulation via BI,and then verify the results through probability threshold before returning the final correct answer.To improve the efficiency of BI in the period of verifying,MBI is then introduced to decrease the size of the candidate results,which further upgrades the algorithm.Finally,experiments on real and synthetic datasets with different features are carried out to verify the efficientcy of the proposed algorithms.

Keywords/Search Tags:

uncertain string, index construction, filtering strategy, substring matching

PDF Full Text Request

Related items

1	Approximate Text Alignment Techniques And Optimization Approaches
2	Approximate String Matching For Chinese Characters By Combining Filtering And Bit-parallelism
3	Research On LCS' Level Index Construction Algorithm
4	Research On Filter Algorithms For Approximate String Matching
5	Approximate String Matching Algorithms And Optimizition Techniques Using Local Filtering
6	An Algorithm For Matching Strings With Wildcards
7	An Approximate String Matching System Using Local Filtering
8	Research And Implementation Of Deep Packet Filtering Technology
9	Research Of High-Performance String Matching Technology For Large Scale String Set
10	Research On Q-gram Filters For Approximate String Matching