Studies On String Mathching Algorithm Based On The Character’s Law

Posted on:2013-12-28

Degree:Master

Type:Thesis

Country:China

Candidate:G H Li

Full Text:PDF

GTID:2248330371476613

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

The string matching problem is a very important topic in the field of information processing. With the rapid development of network and IT, the information explosion caused by the extreme expansion of the amount of information, making the rapid information processing becomes more and more important. A point of view, the efficiency of the string matching even become a bottleneck for the accurate of the information required by the people.In the field of science (such as computer science, biology, demographic, financial, etc.), nature and social life (such as word frequency, the frequency of first digital numbers, etc.), there are a lot of phenomena or systems, make a confirmed that many of the inherent features are the power-law distribution. The power-law distribution of the statistical law reveals the relationship between many seemingly in no way associated things, but this relationship is known as the unbalanced distribution of the characteristics of a certain perspective, can often move things forward. The Zipfs law, Heaps’ law, Benford’s Law are to study the long-tailed distribution of several important law.As a reference of Zipfs law, Heaps’law and Benford’s law, We make the first study of the intrinsic properties of different character inherent in the vocabulary of English words, and put forward the concept of the character characterization of the number of words that the character’s information, the results verify that the characters in reverse ranking with the distribution of the character’s information is also consistent with power-law distribution, and then, we made the character’s law. Furthermore, the imbalance of the character’s information used to guide the string matching algorithm. Obtained and after the experiments, we found that the character’s law can accelerate the process of string matching and improve the performance of string matching algorithms, we presents the string match algorithm based on the character’s law, the algorithm of three variants of the algorithm is SMCI-noSort algorithm, SMCI-partial-sort algorithm and SMCI-all-sort algorithm. The experiments show that the character’s law can very fast accelerate the process of the string matching, and thus speeding up the speed of the string matching algorithm.Specifically, the innovative research of this paper is mainly reflected in the following points:1. This paper defines a concept of using the characters to represent words, and a concept of the information of a character, through experimental demonstration, the reciprocal of the character’s order and the information of a character are all in line with the power distribution, and then put forward the character’s law.2. This paper proposed SMCI-no-sort algorithm; the idea of the algorithm is completely character-based inverted index structure for string search, the searching process make use of the uneven distribution of the characters. The algorithm and the process simply create the inverted index structure, rather than sorting the substrings, the time of preprocess is very low.3. This paper proposed the SMCI-all-sort algorithm; the algorithm is corresponding to the SMCI-no-sort algorithm to sort all the list in the preprocessing stage, in the searching process, according to the fully ordered list, the operation is only binary search process.4. This paper Proposed the SMCI-partial-sort algorithm; the algorithm makes full use of the amount of the uneven distribution of the character, it select the part of the sort the list in the preprocess phase, so as to enhance the efficiency of the preprocessing phase and search phase.

Keywords/Search Tags:

The feature of the character, Character’s Information, String Matching, Pattern Matching

PDF Full Text Request

Related items

1	Approximate String Matching Under Different Models
2	Research On Wu-Manber Mutiple String Matching Algorithm
3	Research Of String Matching Algorithm Optimization
4	Research And Application Of String Approximate Matching Algorithm Based On Multivariate Information
5	String Matching Algorithm Design And Implementation, Based On The Hierarchical Classification Of Web Content Monitoring System
6	The Research Of String Matching Algorithm In Intrusion Detection System
7	Research On Hardware Acceleration For String Pattern Matching
8	The Research For Fast Exact String Matching Algorithm
9	String Matching Algorithm And Application Of Network Content Analysis
10	Approximate String Matching For Chinese Characters By Combining Filtering And Bit-parallelism