Font Size: a A A

Study On Parallel Algorithms For Approximate String Matching With Single Pattern And Single Text On Heterogeneous Cluster Computing Systems

Posted on:2008-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:D J FanFull Text:PDF
GTID:2178360215970755Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The string matching is one of basic research problems in computer science. The exactstring matching technology requires that the pattern matches completely the substrings of thetext and it doesn't allow errors. In many applications, the pattern needn't match the substringsof the text exactly, so people introduce the approximate string matching technology. When thetext is a very long string, it is time-consuming to solve the approximate string matchingproblem even though the fastest sequential algorithm is executed. So it is necessary to designhighly efficient parallel algorithm for approximate string matching. Due to high performanceand low cost of the cluster computing systems, parallel processing for approximate stringmatching on the cluster computing systems is very meaningful in practice. The key ofdeveloping the coarse-gained parallel algorithms is how to divide the text string and distributeit to the processors properly with the objective to minimize the total processing time from themaster processor distributes the text to all the slave processors finish the matching work.Based on the optimality principle of divisible load theory and the fixed sequence of textdistribution, an optimal text single-round distribution strategy is first presented and itscorresponding closed-form expressions are given on the heterogeneous cluster computingsystems that processors have different computing speeds and communication capabilities.Furthermore, a linear programming model for the optimal text distribution is constructed forthe cluster systems that processors have different computing speeds and communicationcapabilities and memory capacities. The optimal text distribution sequence for some specialcases is also studied.The algorithms analysis and experimental results on the cluster ofpersonal computers show that the required parallel processing time for approximate stringmatching with single pattern and single text applying the optimal text single-round distribution strategy decreases 10~40%and 5~20%respectively compared to dividing the textequally and dividing the text according to the computing speeds of processors.Secondly, for a given round number of text distribution, an optimal text multi-rounddistribution strategy is presented on the heterogeneous cluster computing systems thatprocessors have different computing speeds and communication capabilities according towhether the slave processors can execute overlapped computing and communication.Furthermore, a periodic text multi-round distribution strategy is also proposed and itscorresponding closed-form expressions are given. For the periodic distribution strategy, anoptimal round number is obtained. Algorithms analysis and experimental results on the clusterof personal computers show that the two text multi-round distribution strategies decrease therequired parallel processing time for approximate string matching with single pattern andsingle text to a large extent compared to the text single-round distribution strategy.
Keywords/Search Tags:approximate string matching, parallel algorithm, heterogeneous cluster systems, divisible load, distribution strategy, single round distribution, multi-round distribution
PDF Full Text Request
Related items