Font Size: a A A

Optimization Of String Matching Algorithm Based On Computer Architecture

Posted on:2007-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z H DaiFull Text:PDF
GTID:2178360185954123Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
String matching algorithm is a fundamental problem in Bioinformatics and Information retrieval. Despite of a big amount of efforts spent by researchers on designing efficient string matching algorithm, improving the string matching efficiency remains of primary importance, which is due to the drastic growth of gene data and web content and consequently the more and more critical demand on the performance of string matching.This thesis aims at addressing the problem of how to optimize string matching algorithm with respect to the characteristics of computer architecture, whose primary aim is to explore ILP and improve the performance of data accessing through cache. The main work is listed as below:(1) Research on exploiting and implementing data parallelization of dynamic programming and based on SIMD instruction set.Dynamic programming is the basis of approximate string matching and sequence alignment. Two most popular kinds of such algorithms are presented in this thesis: the one is represented by Smith-Waterman algorithm whose computing complexity is O(n2); the other is represented by Zuker algorithm whose computing complexity is O(n3). We analyze the data dependency and data Parallelism of these two kinds of dynamic programming. Then we exploit data parallelization of Smith-Waterman algorithm based on the thinking of Prefix Computation, and implement it with SIMD instruction set. With the help of the auxiliary array, we organize the way of accessing the data of the Zuker algorithm, which can gain the speedup ratio near the theoretical value when the input size is big.(2) Research on high performance shift-or algorithm based on combining character set. Shift-or algorithm is widely used in information retrieving and filtering. The basic shift-oralgorithm treats the set of ASCII code as its character set, so it transfers its state when reading an 8-bit data. With respect to this point, we combine the states of NFA, set up the NFA on the basic element, which is read to transfer the state of NFA, and improve its size to that of short int. So the frequency of state transfer is reduced and the computing efficiency is enhanced.(3) Research on high performance and low memory cost key-word expression matching algorithm from bit parallelization.Key-word expression matching algorithm is a significant sort of string matching...
Keywords/Search Tags:sequence alignment algorithm, string matching algorithm, SIMD
PDF Full Text Request
Related items