Optimization Of String Matching Algorithm Based On Computer Architecture

Posted on:2007-12-02

Degree:Master

Type:Thesis

Country:China

Candidate:Z H Dai

Full Text:PDF

GTID:2178360185954123

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

String matching algorithm is a fundamental problem in Bioinformatics and Information retrieval. Despite of a big amount of efforts spent by researchers on designing efficient string matching algorithm, improving the string matching efficiency remains of primary importance, which is due to the drastic growth of gene data and web content and consequently the more and more critical demand on the performance of string matching.This thesis aims at addressing the problem of how to optimize string matching algorithm with respect to the characteristics of computer architecture, whose primary aim is to explore ILP and improve the performance of data accessing through cache. The main work is listed as below:(1) Research on exploiting and implementing data parallelization of dynamic programming and based on SIMD instruction set.Dynamic programming is the basis of approximate string matching and sequence alignment. Two most popular kinds of such algorithms are presented in this thesis: the one is represented by Smith-Waterman algorithm whose computing complexity is O(n2); the other is represented by Zuker algorithm whose computing complexity is O(n3). We analyze the data dependency and data Parallelism of these two kinds of dynamic programming. Then we exploit data parallelization of Smith-Waterman algorithm based on the thinking of Prefix Computation, and implement it with SIMD instruction set. With the help of the auxiliary array, we organize the way of accessing the data of the Zuker algorithm, which can gain the speedup ratio near the theoretical value when the input size is big.(2) Research on high performance shift-or algorithm based on combining character set. Shift-or algorithm is widely used in information retrieving and filtering. The basic shift-oralgorithm treats the set of ASCII code as its character set, so it transfers its state when reading an 8-bit data. With respect to this point, we combine the states of NFA, set up the NFA on the basic element, which is read to transfer the state of NFA, and improve its size to that of short int. So the frequency of state transfer is reduced and the computing efficiency is enhanced.(3) Research on high performance and low memory cost key-word expression matching algorithm from bit parallelization.Key-word expression matching algorithm is a significant sort of string matching...

Keywords/Search Tags:

sequence alignment algorithm, string matching algorithm, SIMD

PDF Full Text Request

Related items

1	Implementation Of String Matching Algorithm Based On GPU
2	Biological Sequence Alignment Problem
3	Research On Multiple Sequence Alignment Based On The Star Alignment Strategy And Suffix Trees
4	Research Of High-Performance String Matching Technology For Large Scale String Set
5	The Research For Fast Exact String Matching Algorithm
6	Research On Q-gram Filters For Approximate String Matching
7	Sequentially Matching Similarity String Algorithm Research
8	Research And Application Of String Approximate Matching Algorithm Based On Multivariate Information
9	DNA Sequence Alignment Based On Intelligent Algorithm
10	The Study Of Optimization Algorithm For Multiple Sequence Alignment