Font Size: a A A

Research Of String Matching For Internet Content Filtering

Posted on:2006-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:P LiuFull Text:PDF
GTID:2178360185996959Subject:Computer applications
Abstract/Summary:PDF Full Text Request
String matching, an oldest problem in computer science, can be understood as the problem of searching for a pattern with some property within a given sequence. The simplest case is finding a given string inside a sequence, which suffered a great deal of research and some excellent algorithms were proposed.Recent years, however, have witnessed a dramatic increase in interest on string matching problems, especially within the rapidly growing of Internet and Computational Biology. With the increase of the application of Internet, the information in Internet is growing rapidly; and hence raised a great challenge finding a specific information timely and exactly and securely in the gigantic information stream. For the large magnitude of the data, the real time requirement of the information retrieval and the structural property of the matching text, improvement of the classical string matching techniques should be made to overcome this challenge.This thesis summarizes our attempt to solve these questions and achieve some results as follow:1. Properties of some classical algorithms on random data were summarized and analyzed. The current research of the string matching algorithms were summarized in this thesis, include of single string matching algorithms, such as KMP algorithm, Boyer-Moor algorithm and BOM algorithm, and multiple string matching algorithms, e.g., Aho-Corasick, WuManber and SBOM algorithm. The average time complexity of those algorithms were also given, which was validated by tests on the random data sets. As result, four properties of multiple string matching algorithm were given to guide algorithms designing.2. A partition strategy was proposed for multiple string matching. Experimental result shows that the speed of the classical multiple string matching algorithms decreased rapidly when the number of strings exceeds 5000. To solve this problem, a partition strategy...
Keywords/Search Tags:Filtering
PDF Full Text Request
Related items