Font Size: a A A

Research On Approximate String Matching And Its Application On URL Detection

Posted on:2012-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:H XieFull Text:PDF
GTID:2218330371952078Subject:Computer technology
Abstract/Summary:PDF Full Text Request
String matching problem is a basic and extensive research problem in the computer field. And it is very common in the real applications. Especially with the rapid development in the field of information retrieval and computational biology, the study of string matching has also risen.The study of string matching is initially mainly focused on the exact matching. It is also raised a number of well-known algorithms, such as our well-known KMP algorithm and the BM algorithm, etc. And later it is gradually evolved into the multi-string matching, extended string matching and regular expression matching. However, in practice, we also need there is a slight inconsistency between the pattern and its corresponding text string. Then we recognize the importance of approximate string matching, and it plays an important role in many scenes. Such as the transmission signal processing correction, spelling correction, text searching, computational biology in the sequence alignment, etc. Therefore, to design an efficient algorithms or to optimize an existing approximate matching algorithm has important theoretical and practical value.In the approximate pattern matching problem based on edit distance model, the traditional solution is to use dynamic programming. Or using the search based on NFA, such algorithm uses the pattern string and the given threshold k to construct NFA. And now we focus on approximate matching algorithm based on filtering technology and the reign of parallel algorithms to reduce the average time for approximate matching. In this paper, we optimize an efficient algorithm based on the q-gram index from two-bit parallel and filtering, respectively, for different lengths of patterns. Experiments show that the improved algorithm in the best case can reduce half the time than the similar algorithm.The paper completes the following work:The first chapter introduces the research background, research content, the string matching profiles and organizational structure of the paper. The second chapter describes the three single-string matching methods and the principles of their own classic algorithm. The third chapter and the forth chapter is the paper's core, they describe the approximate matching research methods, and propose an improved version algorithm based on text filtering, then the experiment has been well verified. In the fifth chapter, we introduce the architecture of filtering system based on the url, mainly on the simple approximate string matching application. In the final of the paper, it is a summary ; meanwhile there is a prospect of string matching application.
Keywords/Search Tags:approximate matching, q-gram, bit parallel, edit distance, DP algorithm
PDF Full Text Request
Related items